Filter Out Facet Results

2015-08-10 Thread Paden
Hello, I'm trying to figure out how to filter out particular facets out of my results. I'm doing some Named Entity Extraction and putting them up as faceting information. However, not all the results I get are exact. For example, the string w 5th street will appear in the Person facet list. These

Using Update Request Handlers with Solr

2015-07-29 Thread Paden
Hello all, I've been trying to integrate NER into my solr search so I can get some really good facets out of it. I've already managed to plug in a search handler with code from searchbox.com to get a feel for how it works. And now I'm trying to plug in an update request processor so I can pull

Re: SolrJ/Tika custom indexer not indexing CERTAIN .doc text?

2015-07-27 Thread Paden
Pretty old thread. I know. But in the end it wasn't Solr. I'm fairly certainly that it was Tika. The autoparser wasn't pulling any of the .doc file text. It came out as just blank. The documents were 1997-2003. When I opened them in word 2010 and RESAVED them as 2010 documents they indexed just

Re: Data Import Handler Stays Idle

2015-07-21 Thread Paden
Okay. I'm going to run the index again with specifications that you recommended. This could take a few hours but I will post the entire trace on that error when it pops up again and I will let you guys know the results of increasing the heap size. -- View this message in context:

Re: Data Import Handler Stays Idle

2015-07-21 Thread Paden
Hey shawn when I use the -m 2g command in my script I get the error a 'cannot open [path]/server/logs/solr.log for reading: No such file or directory' I do not see how this would affect that. -- View this message in context:

Re: Data Import Handler Stays Idle

2015-07-21 Thread Paden
There are some zip files inside the directory and have been addressed to in the database. I'm thinking those are the one's it's jumping right over. They are not the issue. At least I'm 95% sure. And Shawn if you're still watching I'm sorry I'm using solr-5.1.0. -- View this message in context:

Re: Data Import Handler Stays Idle

2015-07-20 Thread Paden
Yes the number of unimported matches. No I did not specify false to commit on any of my dataimporthandler. Since it defaults to true I really didn't take it into account though. -- View this message in context:

Data Import Handler Stays Idle

2015-07-20 Thread Paden
Hello, I'm currently trying to index about 54,000 files with the Solr Data Import Handler and I've got a small problem. It fetches about half (28,289) of the 54,000 files and it process about 14,146 documents before it stops and just stands idle. Here's the status output { responseHeader: {

Re: Data Import Handler Stays Idle

2015-07-20 Thread Paden
I was consistently checking the logs to see if there were any errors that would give me any idling. There were no errors except for a few skipped documents due to some Illegal IOexceptions from Tika but none of those occurred around the time that solr began idling. A lot of font warnings. But

DIH Not Indexing Two Documents

2015-07-15 Thread Paden
Hello, I've ran into quite the snag and I'm wondering if anyone can help me out here. So the situation. I am using the DataImportHandler to pull from a database and a Linux file system. The database has the metadata. The file system the document text. I thought it had indexed all the files I

Re: DIH Not Indexing Two Documents

2015-07-15 Thread Paden
That should be author 280 and 281. Sorry -- View this message in context: http://lucene.472066.n3.nabble.com/DIH-Not-Indexing-Two-Documents-tp4217546p4217547.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: DIH Not Indexing Two Documents

2015-07-15 Thread Paden
You were 100 percent right. I went back and checked the metadata looking for multiple instances of the same file path. Both of the files had an extra set of metadata with the same filepath. Thank you very much. -- View this message in context:

Highlighting pre and post tags not working

2015-07-13 Thread Paden
Hello, I'm trying to get some Solr highlighting going but I've run into a small problem. When I set the pre and post tags with my own custom tag I get an XML error XML Parsing Error: mismatched tag. Expected: /em. Location: file:///home/paden/Downloads/solr-5.1.0/server/solr/Testcore2/conf

Re: SolrJ/Tika custom indexer not indexing CERTAIN .doc text?

2015-07-09 Thread Paden
Haha no need to reinvent wheels. Especially when you don't know java. Just a prototype anyway. I made a very strong assumption that it was pulling the text as blank because I would copy the EXACT same text from one file in the file system and put it into another file under a different name, but

SolrJ/Tika custom indexer not indexing CERTAIN .doc text?

2015-07-09 Thread Paden
{ TikaSqlIndexer idxer = new TikaSqlIndexer(http://localhost:8983/solr/Testcore3;); //idxer.Index(); idxer.doTikaDocuments(new File(/home/paden/Documents/LWP_Files/BIGDATA)); } catch

Re: SolrJ/Tika custom indexer not indexing CERTAIN .doc text?

2015-07-09 Thread Paden
I posted the code anyway just forgot to get rid of that line in the post. Sorry -- View this message in context: http://lucene.472066.n3.nabble.com/SolrJ-Tika-custom-indexer-not-indexing-CERTAIN-doc-text-tp4216541p4216542.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Search Handler Question

2015-07-08 Thread Paden
Awesome. This looks like a great resource. Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/Search-Handler-Question-tp4216341p4216348.html Sent from the Solr - User mailing list archive at Nabble.com.

Search Handler Question

2015-07-08 Thread Paden
Hello, I've been trying to tune my search handler to get some better search results and I just have like a general question about the search handler. This being the first time I've designed/implemented a search engine I've been told that other engines operate on a kind of layered search. By

Can I instruct the Tika Entity Processor to skip the first page using the DIH?

2015-07-08 Thread Paden
Hello, I'm using the DIH to import some files from one of my local directories. However, every single one of these files has the same first page. So I want to skip that first page in order to optimize search. Can this be accomplished by an instruction within the dataimporthandler or, if not, how

Solr edismax always using the default fields?

2015-07-07 Thread Paden
Hello, I'm trying to tune a search handler to get the results that I want. In the solrconfig.xml I specify several different query fields for the edismax query parser but it always seems to use the default fields instead. For example and clarification, when I remove Author from the df list of

Re: Solr edismax always using the default fields?

2015-07-07 Thread Paden
Well I've just been using an authors name. Last Name, First Name Middle Initial. Like *Snowman, Frosty T.* As for the debugging I'm not really seeing anything that would help me understand why the query fields aren't kicking in. And instead only the default fields are. I do see that is parsing

Re: Solr edismax always using the default fields?

2015-07-07 Thread Paden
It just defaults to text anyway. I remove it entirely from the solrconfig and never specify it in the solr query portion but it still defaults to text anyway. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-edismax-always-using-the-default-fields-tp4216204p4216224.html

Re: Solr edismax always using the default fields?

2015-07-07 Thread Paden
Thank you! Thank you, thank you, thank you. That worked and it brought the right results. Thanks. It was driving me crazy. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-edismax-always-using-the-default-fields-tp4216204p4216228.html Sent from the Solr - User mailing

Using Facets to Limit the Scope of a Search

2015-07-01 Thread Paden
Hello, I feel like this is a really basic question but I'm struggling to find the answer. I'm trying to figure out what the HTTP request is would limit the scope of a search based on the facet. Say I performed a query and the facet field request returns the top ten authors of the facet count and

Using the DataImportHandler to get filepath from MySQL DataBase BackSlash Character Problem

2015-06-30 Thread Paden
Hello, I'm having a slight Catch-22 scenario going on with my Solr indexing process. I'm using the DataImportHandler to pull a filepath from a database. The problems is that Windows filepaths have the backslash character inside their paths. \\some\filepath So when insert this data into MySQL

Creating A User Interface On Top of Solr

2015-06-23 Thread Paden
Hello, I'm trying to custom build my own Solr interface in Visual Studios instead of using/modifying the original Velocity interface. I'm mostly doing this as a learning exercise for building UI that's why I'm opting out of using it. The problem is I'm pretty new and not sure where to begin.

Re: Connecting to a Solr server remotely

2015-06-22 Thread Paden
That did it Shawn. Thanks for the help! -- View this message in context: http://lucene.472066.n3.nabble.com/Connecting-to-a-Solr-server-remotely-tp4213335p4213343.html Sent from the Solr - User mailing list archive at Nabble.com.

Connecting to a Solr server remotely

2015-06-22 Thread Paden
Hello, I've set up a Solr server on my Linux Virtual Machine. Now I'm trying to access it remotely on my Windows Machine using an http request from a browser. Any time I try to access it with a request such as http//localhost:8983/solr I always get a connection error (with the server running

RE: Connecting to a Solr server remotely

2015-06-22 Thread Paden
I checked to see if the firewall rules were blocking it and there were no rules enabled and just to be sure I turned off the firewall completely and it's still being blocked but I did get a message with netstat that might help. tcp6 0 0 :::8983 :::*

Re: Error when submitting PDF to Solr w/text fields using SolrJ

2015-06-19 Thread Paden
Yeah I'm just gonna say hands down this was a totally bad question. My fault, mea culpa. I'm pretty new to working in an IDE environment and using a stack trace (I just finished my first year of CS at University and now I'm interning). I'm actually kind of embarrassed by how long it took me to

Re: Error when submitting PDF to Solr w/text fields using SolrJ

2015-06-19 Thread Paden
Yeah, actually changing the field to text_en or text_en_splitting actually made it so my indexer indexed all my files. The only problem is, I don't think it's doing it well. I have two Cores that I'm working with. Both of them have indexed the same set of files. The first core, which I will

Re: Error when submitting PDF to Solr w/text fields using SolrJ

2015-06-19 Thread Paden
Yes the number of indexed documents is correct. But the queries I perform fall short of what they should be. You're probably right though. I probably have to create a better analyzer. And I'm not really worried about the other fields. I've already check to see if it's storing them correctly and

Error when submitting PDF to Solr w/text fields using SolrJ

2015-06-18 Thread Paden
Hello, I'm using Solr to pull information from a Database and a file system simultaneously. The database houses the file path of the file in the file system. It pulls all of those just fine. In fact, it combines the metadata from the database and the metadata from the file system great. The

Re: Error when submitting PDF to Solr w/text fields using SolrJ

2015-06-18 Thread Paden
class=solr.CurrencyField precisionStep=8 defaultCurrency=USD currencyConfig=currency.xml / /schema ENTIRE STACK TRACE /home/paden/Documents/LWP_Files/BIGDATA/5974412.pdf org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at http://localhost:8983/solr/Testcore3

Re: Error when submitting PDF to Solr w/text fields using SolrJ

2015-06-18 Thread Paden
Just rolling out a little bit more information as it is coming. I changed the field type in the schema to text_general and that didn't change a thing. Another thing is that it's consistently submitting/not submitting the same documents. I will run over it one time and it won't index a set of

TikaEntityProcessor Not Finding My Files

2015-06-16 Thread Paden
entity and just run the database draw it works fine. I can run and query and I get this output when I run a faceted search response: { numFound: 283, start: 0, docs: [ { id: /home/paden/Documents/LWP_Files/BIGDATA/6220106.pdf, title: ENGINEERING INITIATION

Re: TikaEntityProcessor Not Finding My Files

2015-06-16 Thread Paden
I thought it might be useful to list the logging errors as well. Here they are. There are just three. WARN FileDataSourceFileDataSource.basePath is empty. Resolving to: /home/paden/Downloads/solr-5.1.0/server/. ERRORDocBuilder Exception while processing: file document

Solr 5.1.0 - Where do I put the JDBC?

2015-06-15 Thread Paden
Hello, Just a minor question. I'm using the Java Database Connector with the DIH trying to index from a MySQL database but whenever I run the DIH for a full import it keeps giving me this error Full Import failed:java.lang.RuntimeException: java.lang.RuntimeException:

Re: Solr 5.1.0 - Where do I put the JDBC?

2015-06-15 Thread Paden
I'm using Jetty. That might be important. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-5-1-0-Where-do-I-put-the-JDBC-tp4211923p4211925.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Merging Sets of Data from Two Different Sources

2015-06-11 Thread Paden
You were very VERY helpful. Thank you very much. If I could bug you for one last question. Do you know where the documentation is that would help me write my own indexer? -- View this message in context:

Re: Merging Sets of Data from Two Different Sources

2015-06-11 Thread Paden
I do have a link between both sets of data and that would be the filepath that could be indexed by both. I do, however, have large PDF's that do need to be indexed. So just for clarification, I could write an indexer that used both the DIH and SolrCell to submit a combined record to Solr or

Merging Sets of Data from Two Different Sources

2015-06-11 Thread Paden
I'm trying to figure out if Solr is a good fit for my project. I have two sets of data. On the one hand there is a bunch of files sitting in a local file system in a Linux file system. On the other is a set of metadata FOR the files that is located in a MySQL database. I need a program that

Re: Merging Sets of Data from Two Different Sources

2015-06-11 Thread Paden
So you're saying that Tika can parse the text OUTSIDE of Solr. So I would still be able to process my PDF's with Tika just outside of Solr specifically correct? -- View this message in context:

Re: Merging Sets of Data from Two Different Sources

2015-06-11 Thread Paden
The filepath is the key in both the filesystem and the database -- View this message in context: http://lucene.472066.n3.nabble.com/Merging-Sets-of-Data-from-Two-Different-Sources-tp4211166p4211253.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Merging Sets of Data from Two Different Sources

2015-06-11 Thread Paden
Both sources, the filesystem and the database, contain the file path for each individual file -- View this message in context: http://lucene.472066.n3.nabble.com/Merging-Sets-of-Data-from-Two-Different-Sources-tp4211166p4211251.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Merging Sets of Data from Two Different Sources

2015-06-11 Thread Paden
So you're saying I could merge both the metadata in the database and their files in the file system into one query-able item in solr by just customizing the DIH correctly and getting the right schema? (I'm sorry this sounds like a redundant question but I've been trying to find an answer for