from:"\"Paden\""

Filter Out Facet Results

2015-08-10 Thread Paden

Hello, I'm trying to figure out how to filter out particular facets out of my results. I'm doing some Named Entity Extraction and putting them up as faceting information. However, not all the results I get are exact. For example, the string "w 5th street" will appear in the "Person" facet list. Th

Using Update Request Handlers with Solr

2015-07-29 Thread Paden

Hello all, I've been trying to integrate NER into my solr search so I can get some really good facets out of it. I've already managed to plug in a search handler with code from searchbox.com to get a feel for how it works. And now I'm trying to plug in an update request processor so I can pull fac

Re: SolrJ/Tika custom indexer not indexing CERTAIN .doc text?

2015-07-27 Thread Paden

Pretty old thread. I know. But in the end it wasn't Solr. I'm fairly certainly that it was Tika. The autoparser wasn't pulling any of the ".doc" file text. It came out as just blank. The documents were 1997-2003. When I opened them in word 2010 and RESAVED them as 2010 documents they indexed just f

Re: Data Import Handler Stays Idle

2015-07-21 Thread Paden

Hey shawn when I use the -m 2g command in my script I get the error a 'cannot open [path]/server/logs/solr.log for reading: No such file or directory' I do not see how this would affect that. -- View this message in context: http://lucene.472066.n3.nabble.com/Data-Import-Handler-Stays-Idle-tp4

Re: Data Import Handler Stays Idle

2015-07-21 Thread Paden

Okay. I'm going to run the index again with specifications that you recommended. This could take a few hours but I will post the entire trace on that error when it pops up again and I will let you guys know the results of increasing the heap size. -- View this message in context: http://lucene

Re: Data Import Handler Stays Idle

2015-07-21 Thread Paden

There are some zip files inside the directory and have been addressed to in the database. I'm thinking those are the one's it's jumping right over. They are not the issue. At least I'm 95% sure. And Shawn if you're still watching I'm sorry I'm using solr-5.1.0. -- View this message in context:

Re: Data Import Handler Stays Idle

2015-07-20 Thread Paden

Yes the number of unimported matches. No I did not specify "false" to commit on any of my dataimporthandler. Since it defaults to true I really didn't take it into account though. -- View this message in context: http://lucene.472066.n3.nabble.com/Data-Import-Handler-Stays-Idle-tp4218250p42182

Re: Data Import Handler Stays Idle

2015-07-20 Thread Paden

I was consistently checking the logs to see if there were any errors that would give me any idling. There were no errors except for a few skipped documents due to some Illegal IOexceptions from Tika but none of those occurred around the time that solr began idling. A lot of font warnings. But again

Data Import Handler Stays Idle

2015-07-20 Thread Paden

Hello, I'm currently trying to index about 54,000 files with the Solr Data Import Handler and I've got a small problem. It fetches about half (28,289) of the 54,000 files and it process about 14,146 documents before it stops and just stands idle. Here's the status output { "responseHeader": {

Re: DIH Not Indexing Two Documents

2015-07-15 Thread Paden

You were 100 percent right. I went back and checked the metadata looking for multiple instances of the same file path. Both of the files had an extra set of metadata with the same filepath. Thank you very much. -- View this message in context: http://lucene.472066.n3.nabble.com/DIH-Not-Indexin

Re: DIH Not Indexing Two Documents

2015-07-15 Thread Paden

That should be author 280 and 281. Sorry -- View this message in context: http://lucene.472066.n3.nabble.com/DIH-Not-Indexing-Two-Documents-tp4217546p4217547.html Sent from the Solr - User mailing list archive at Nabble.com.

DIH Not Indexing Two Documents

2015-07-15 Thread Paden

Hello, I've ran into quite the snag and I'm wondering if anyone can help me out here. So the situation. I am using the DataImportHandler to pull from a database and a Linux file system. The database has the metadata. The file system the document text. I thought it had indexed all the files I had

Highlighting pre and post tags not working

2015-07-13 Thread Paden

Hello, I'm trying to get some Solr highlighting going but I've run into a small problem. When I set the pre and post tags with my own custom tag I get an XML error XML Parsing Error: mismatched tag. Expected: . Location: file:///home/paden/Downloads/solr-5.1.0/server/solr/Test

Re: SolrJ/Tika custom indexer not indexing CERTAIN .doc text?

2015-07-09 Thread Paden

Haha no need to reinvent wheels. Especially when you don't know java. Just a prototype anyway. I made a very strong assumption that it was pulling the text as blank because I would copy the EXACT same text from one file in the file system and put it into another file under a different name, but in

Re: SolrJ/Tika custom indexer not indexing CERTAIN .doc text?

2015-07-09 Thread Paden

I posted the code anyway just forgot to get rid of that line in the post. Sorry -- View this message in context: http://lucene.472066.n3.nabble.com/SolrJ-Tika-custom-indexer-not-indexing-CERTAIN-doc-text-tp4216541p4216542.html Sent from the Solr - User mailing list archive at Nabble.com.

SolrJ/Tika custom indexer not indexing CERTAIN .doc text?

2015-07-09 Thread Paden

ocs = new ArrayList(); public static void main(String[] args) { try{ TikaSqlIndexer idxer = new TikaSqlIndexer("http://localhost:8983/solr/Testcore3";); //idxer.Index();

Can I instruct the Tika Entity Processor to skip the first page using the DIH?

2015-07-08 Thread Paden

Hello, I'm using the DIH to import some files from one of my local directories. However, every single one of these files has the same first page. So I want to skip that first page in order to optimize search. Can this be accomplished by an instruction within the dataimporthandler or, if not, how

Re: Search Handler Question

2015-07-08 Thread Paden

Awesome. This looks like a great resource. Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/Search-Handler-Question-tp4216341p4216348.html Sent from the Solr - User mailing list archive at Nabble.com.

Search Handler Question

2015-07-08 Thread Paden

Hello, I've been trying to tune my search handler to get some better search results and I just have like a general question about the search handler. This being the first time I've designed/implemented a search engine I've been "told" that other engines operate on a kind of layered search. By l

Re: Solr edismax always using the default fields?

2015-07-07 Thread Paden

Thank you! Thank you, thank you, thank you. That worked and it brought the right results. Thanks. It was driving me crazy. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-edismax-always-using-the-default-fields-tp4216204p4216228.html Sent from the Solr - User mailing l

Re: Solr edismax always using the default fields?

2015-07-07 Thread Paden

It just defaults to text anyway. I remove it entirely from the solrconfig and never specify it in the solr query portion but it still defaults to text anyway. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-edismax-always-using-the-default-fields-tp4216204p4216224.html

Re: Solr edismax always using the default fields?

2015-07-07 Thread Paden

Well I've just been using an authors name. Last Name, First Name Middle Initial. Like *Snowman, Frosty T.* As for the debugging I'm not really seeing anything that would help me understand why the query fields aren't kicking in. And instead only the default fields are. I do see that is parsing th

Solr edismax always using the default fields?

2015-07-07 Thread Paden

Hello, I'm trying to tune a search handler to get the results that I want. In the solrconfig.xml I specify several different query fields for the edismax query parser but it always seems to use the default fields instead. For example and clarification, when I remove Author from the "df" list of

Using Facets to Limit the Scope of a Search

2015-07-01 Thread Paden

Hello, I feel like this is a really basic question but I'm struggling to find the answer. I'm trying to figure out what the HTTP request is would limit the scope of a search based on the facet. Say I performed a query and the facet field request returns the top ten authors of the facet count and

Using the DataImportHandler to get filepath from MySQL DataBase BackSlash Character Problem

2015-06-30 Thread Paden

Hello, I'm having a slight "Catch-22" scenario going on with my Solr indexing process. I'm using the DataImportHandler to pull a filepath from a database. The problems is that Windows filepaths have the backslash character inside their paths. \\some\filepath So when insert this data into MySQL

Creating A User Interface On Top of Solr

2015-06-23 Thread Paden

Hello, I'm trying to custom build my own Solr interface in Visual Studios instead of using/modifying the original Velocity interface. I'm mostly doing this as a learning exercise for building UI that's why I'm opting out of using it. The problem is I'm pretty new and not sure where to begin. Mo

Re: Connecting to a Solr server remotely

2015-06-22 Thread Paden

That did it Shawn. Thanks for the help! -- View this message in context: http://lucene.472066.n3.nabble.com/Connecting-to-a-Solr-server-remotely-tp4213335p4213343.html Sent from the Solr - User mailing list archive at Nabble.com.

RE: Connecting to a Solr server remotely

2015-06-22 Thread Paden

I checked to see if the firewall rules were blocking it and there were no rules enabled and just to be sure I turned off the firewall completely and it's still being blocked but I did get a message with netstat that might help. tcp6 0 0 :::8983 :::*L

Connecting to a Solr server remotely

2015-06-22 Thread Paden

Hello, I've set up a Solr server on my Linux Virtual Machine. Now I'm trying to access it remotely on my Windows Machine using an http request from a browser. Any time I try to access it with a request such as "http//localhost:8983/solr" I always get a connection error (with the server running

Re: Error when submitting PDF to Solr w/text fields using SolrJ

2015-06-19 Thread Paden

Yes the number of indexed documents is correct. But the queries I perform fall short of what they should be. You're probably right though. I probably have to create a better analyzer. And I'm not really worried about the other fields. I've already check to see if it's storing them correctly and i

Re: Error when submitting PDF to Solr w/text fields using SolrJ

2015-06-19 Thread Paden

Yeah, actually changing the field to "text_en" or "text_en_splitting" actually made it so my indexer indexed all my files. The only problem is, I don't think it's doing it well. I have two Cores that I'm working with. Both of them have indexed the same set of files. The first core, which I will r

Re: Error when submitting PDF to Solr w/text fields using SolrJ

2015-06-19 Thread Paden

Yeah I'm just gonna say hands down this was a totally bad question. My fault, mea culpa. I'm pretty new to working in an IDE environment and using a stack trace (I just finished my first year of CS at University and now I'm interning). I'm actually kind of embarrassed by how long it took me to real

Re: Error when submitting PDF to Solr w/text fields using SolrJ

2015-06-18 Thread Paden

Just rolling out a little bit more information as it is coming. I changed the field type in the schema to text_general and that didn't change a thing. Another thing is that it's consistently submitting/not submitting the same documents. I will run over it one time and it won't index a set of docu

Re: Error when submitting PDF to Solr w/text fields using SolrJ

2015-06-18 Thread Paden

/home/paden/Documents/LWP_Files/BIGDATA/5974412.pdf org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at http://localhost:8983/solr/Testcore3: Exception writing document id /home/paden/Documents/LWP_Files

Error when submitting PDF to Solr w/text fields using SolrJ

2015-06-18 Thread Paden

Hello, I'm using Solr to pull information from a Database and a file system simultaneously. The database houses the file path of the file in the file system. It pulls all of those just fine. In fact, it combines the metadata from the database and the metadata from the file system great. The probl

Re: TikaEntityProcessor Not Finding My Files

2015-06-16 Thread Paden

I thought it might be useful to list the logging errors as well. Here they are. There are just three. WARN FileDataSourceFileDataSource.basePath is empty. Resolving to: /home/paden/Downloads/solr-5.1.0/server/. ERRORDocBuilder Exception while processing: file document

TikaEntityProcessor Not Finding My Files

2015-06-16 Thread Paden

I'd like to note that when I delete the second entity and just run the database draw it works fine. I can run and query and I get this output when I run a faceted search "response": { "numFound": 283, "start":

Re: Solr 5.1.0 - Where do I put the JDBC?

2015-06-15 Thread Paden

I'm using Jetty. That might be important. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-5-1-0-Where-do-I-put-the-JDBC-tp4211923p4211925.html Sent from the Solr - User mailing list archive at Nabble.com.

Solr 5.1.0 - Where do I put the JDBC?

2015-06-15 Thread Paden

Hello, Just a minor question. I'm using the Java Database Connector with the DIH trying to index from a MySQL database but whenever I run the DIH for a full import it keeps giving me this error Full Import failed:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.solr.handler.data

Re: Merging Sets of Data from Two Different Sources

2015-06-11 Thread Paden

The filepath is the key in both the filesystem and the database -- View this message in context: http://lucene.472066.n3.nabble.com/Merging-Sets-of-Data-from-Two-Different-Sources-tp4211166p4211253.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Merging Sets of Data from Two Different Sources

2015-06-11 Thread Paden

Both sources, the filesystem and the database, contain the file path for each individual file -- View this message in context: http://lucene.472066.n3.nabble.com/Merging-Sets-of-Data-from-Two-Different-Sources-tp4211166p4211251.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Merging Sets of Data from Two Different Sources

2015-06-11 Thread Paden

So you're saying I could merge both the metadata in the database and their files in the file system into one query-able item in solr by just customizing the DIH correctly and getting the right schema? (I'm sorry this sounds like a redundant question but I've been trying to find an answer for the

Re: Merging Sets of Data from Two Different Sources

2015-06-11 Thread Paden

You were very VERY helpful. Thank you very much. If I could bug you for one last question. Do you know where the documentation is that would help me write my own indexer? -- View this message in context: http://lucene.472066.n3.nabble.com/Merging-Sets-of-Data-from-Two-Different-Sources-tp42111

Re: Merging Sets of Data from Two Different Sources

2015-06-11 Thread Paden

So you're saying that Tika can parse the text OUTSIDE of Solr. So I would still be able to process my PDF's with Tika just outside of Solr specifically correct? -- View this message in context: http://lucene.472066.n3.nabble.com/Merging-Sets-of-Data-from-Two-Different-Sources-tp4211166p421117

Re: Merging Sets of Data from Two Different Sources

2015-06-11 Thread Paden

I do have a link between both sets of data and that would be the filepath that could be indexed by both. I do, however, have large PDF's that do need to be indexed. So just for clarification, I could write an indexer that used both the DIH and SolrCell to submit a combined record to Solr or would

Merging Sets of Data from Two Different Sources

2015-06-11 Thread Paden

I'm trying to figure out if Solr is a good fit for my project. I have two sets of data. On the one hand there is a bunch of files sitting in a local file system in a Linux file system. On the other is a set of metadata FOR the files that is located in a MySQL database. I need a program that can

46 matches

Mail list logo