Hello,
I'm trying to figure out how to filter out particular facets out of my
results. I'm doing some Named Entity Extraction and putting them up as
faceting information. However, not all the results I get are exact. For
example, the string "w 5th street" will appear in the "Person" facet list.
Th
Hello all,
I've been trying to integrate NER into my solr search so I can get some
really good facets out of it. I've already managed to plug in a search
handler with code from searchbox.com to get a feel for how it works. And now
I'm trying to plug in an update request processor so I can pull fac
Pretty old thread. I know. But in the end it wasn't Solr. I'm fairly
certainly that it was Tika. The autoparser wasn't pulling any of the ".doc"
file text. It came out as just blank. The documents were 1997-2003. When I
opened them in word 2010 and RESAVED them as 2010 documents they indexed
just f
Hey shawn when I use the -m 2g command in my script I get the error a 'cannot
open [path]/server/logs/solr.log for reading: No such file or directory' I
do not see how this would affect that.
--
View this message in context:
http://lucene.472066.n3.nabble.com/Data-Import-Handler-Stays-Idle-tp4
Okay. I'm going to run the index again with specifications that you
recommended. This could take a few hours but I will post the entire trace on
that error when it pops up again and I will let you guys know the results of
increasing the heap size.
--
View this message in context:
http://lucene
There are some zip files inside the directory and have been addressed to in
the database. I'm thinking those are the one's it's jumping right over. They
are not the issue. At least I'm 95% sure. And Shawn if you're still watching
I'm sorry I'm using solr-5.1.0.
--
View this message in context:
Yes the number of unimported matches. No I did not specify "false" to commit
on any of my dataimporthandler. Since it defaults to true I really didn't
take it into account though.
--
View this message in context:
http://lucene.472066.n3.nabble.com/Data-Import-Handler-Stays-Idle-tp4218250p42182
I was consistently checking the logs to see if there were any errors that
would give me any idling. There were no errors except for a few skipped
documents due to some Illegal IOexceptions from Tika but none of those
occurred around the time that solr began idling. A lot of font warnings. But
again
Hello,
I'm currently trying to index about 54,000 files with the Solr Data Import
Handler and I've got a small problem. It fetches about half (28,289) of the
54,000 files and it process about 14,146 documents before it stops and just
stands idle. Here's the status output
{
"responseHeader": {
You were 100 percent right. I went back and checked the metadata looking for
multiple instances of the same file path. Both of the files had an extra set
of metadata with the same filepath. Thank you very much.
--
View this message in context:
http://lucene.472066.n3.nabble.com/DIH-Not-Indexin
That should be author 280 and 281. Sorry
--
View this message in context:
http://lucene.472066.n3.nabble.com/DIH-Not-Indexing-Two-Documents-tp4217546p4217547.html
Sent from the Solr - User mailing list archive at Nabble.com.
Hello,
I've ran into quite the snag and I'm wondering if anyone can help me out
here. So the situation.
I am using the DataImportHandler to pull from a database and a Linux file
system. The database has the metadata. The file system the document text. I
thought it had indexed all the files I had
Hello,
I'm trying to get some Solr highlighting going but I've run into a small
problem. When I set the pre and post tags with my own custom tag I get an
XML error
XML Parsing Error: mismatched tag. Expected: .
Location:
file:///home/paden/Downloads/solr-5.1.0/server/solr/Test
Haha no need to reinvent wheels. Especially when you don't know java. Just a
prototype anyway.
I made a very strong assumption that it was pulling the text as blank
because I would copy the EXACT same text from one file in the file system
and put it into another file under a different name, but in
I posted the code anyway just forgot to get rid of that line in the post.
Sorry
--
View this message in context:
http://lucene.472066.n3.nabble.com/SolrJ-Tika-custom-indexer-not-indexing-CERTAIN-doc-text-tp4216541p4216542.html
Sent from the Solr - User mailing list archive at Nabble.com.
ocs = new ArrayList();
public static void main(String[] args) {
try{
TikaSqlIndexer idxer = new
TikaSqlIndexer("http://localhost:8983/solr/Testcore3";);
//idxer.Index();
Hello, I'm using the DIH to import some files from one of my local
directories. However, every single one of these files has the same first
page. So I want to skip that first page in order to optimize search.
Can this be accomplished by an instruction within the dataimporthandler or,
if not, how
Awesome. This looks like a great resource. Thanks!
--
View this message in context:
http://lucene.472066.n3.nabble.com/Search-Handler-Question-tp4216341p4216348.html
Sent from the Solr - User mailing list archive at Nabble.com.
Hello,
I've been trying to tune my search handler to get some better search results
and I just have like a general question about the search handler.
This being the first time I've designed/implemented a search engine I've
been "told" that other engines operate on a kind of layered search. By
l
Thank you! Thank you, thank you, thank you. That worked and it brought the
right results. Thanks. It was driving me crazy.
--
View this message in context:
http://lucene.472066.n3.nabble.com/Solr-edismax-always-using-the-default-fields-tp4216204p4216228.html
Sent from the Solr - User mailing l
It just defaults to text anyway. I remove it entirely from the solrconfig and
never specify it in the solr query portion but it still defaults to text
anyway.
--
View this message in context:
http://lucene.472066.n3.nabble.com/Solr-edismax-always-using-the-default-fields-tp4216204p4216224.html
Well I've just been using an authors name. Last Name, First Name Middle
Initial. Like *Snowman, Frosty T.*
As for the debugging I'm not really seeing anything that would help me
understand why the query fields aren't kicking in. And instead only the
default fields are.
I do see that is parsing th
Hello,
I'm trying to tune a search handler to get the results that I want. In the
solrconfig.xml I specify several different query fields for the edismax
query parser but it always seems to use the default fields instead.
For example and clarification, when I remove Author from the "df" list of
Hello,
I feel like this is a really basic question but I'm struggling to find the
answer. I'm trying to figure out what the HTTP request is would limit the
scope of a search based on the facet. Say I performed a query and the facet
field request returns the top ten authors of the facet count and
Hello,
I'm having a slight "Catch-22" scenario going on with my Solr indexing
process. I'm using the DataImportHandler to pull a filepath from a database.
The problems is that Windows filepaths have the backslash character inside
their paths.
\\some\filepath
So when insert this data into MySQL
Hello,
I'm trying to custom build my own Solr interface in Visual Studios instead
of using/modifying the original Velocity interface. I'm mostly doing this as
a learning exercise for building UI that's why I'm opting out of using it.
The problem is I'm pretty new and not sure where to begin. Mo
That did it Shawn. Thanks for the help!
--
View this message in context:
http://lucene.472066.n3.nabble.com/Connecting-to-a-Solr-server-remotely-tp4213335p4213343.html
Sent from the Solr - User mailing list archive at Nabble.com.
I checked to see if the firewall rules were blocking it and there were no
rules enabled and just to be sure I turned off the firewall completely and
it's still being blocked but I did get a message with netstat that might
help.
tcp6 0 0 :::8983 :::*L
Hello,
I've set up a Solr server on my Linux Virtual Machine. Now I'm trying to
access it remotely on my Windows Machine using an http request from a
browser.
Any time I try to access it with a request such as
"http//localhost:8983/solr" I always get a connection error (with the server
running
Yes the number of indexed documents is correct. But the queries I perform
fall short of what they should be. You're probably right though. I probably
have to create a better analyzer.
And I'm not really worried about the other fields. I've already check to see
if it's storing them correctly and i
Yeah, actually changing the field to "text_en" or "text_en_splitting"
actually made it so my indexer indexed all my files. The only problem is, I
don't think it's doing it well.
I have two Cores that I'm working with. Both of them have indexed the same
set of files. The first core, which I will r
Yeah I'm just gonna say hands down this was a totally bad question. My fault,
mea culpa. I'm pretty new to working in an IDE environment and using a stack
trace (I just finished my first year of CS at University and now I'm
interning). I'm actually kind of embarrassed by how long it took me to
real
Just rolling out a little bit more information as it is coming. I changed the
field type in the schema to text_general and that didn't change a thing.
Another thing is that it's consistently submitting/not submitting the same
documents. I will run over it one time and it won't index a set of
docu
/home/paden/Documents/LWP_Files/BIGDATA/5974412.pdf
org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error
from server at http://localhost:8983/solr/Testcore3: Exception writing
document id /home/paden/Documents/LWP_Files
Hello,
I'm using Solr to pull information from a Database and a file system
simultaneously. The database houses the file path of the file in the file
system. It pulls all of those just fine. In fact, it combines the metadata
from the database and the metadata from the file system great. The probl
I thought it might be useful to list the logging errors as well. Here they
are. There are just three.
WARN FileDataSourceFileDataSource.basePath is empty. Resolving to:
/home/paden/Downloads/solr-5.1.0/server/.
ERRORDocBuilder
Exception while processing: file document
I'd like to note that when I delete the second entity and just run the
database draw it works fine. I can run and query and I get this output when
I run a faceted search
"response": {
"numFound": 283,
"start":
I'm using Jetty. That might be important.
--
View this message in context:
http://lucene.472066.n3.nabble.com/Solr-5-1-0-Where-do-I-put-the-JDBC-tp4211923p4211925.html
Sent from the Solr - User mailing list archive at Nabble.com.
Hello,
Just a minor question. I'm using the Java Database Connector with the DIH
trying to index from a MySQL database but whenever I run the DIH for a full
import it keeps giving me this error
Full Import failed:java.lang.RuntimeException: java.lang.RuntimeException:
org.apache.solr.handler.data
The filepath is the key in both the filesystem and the database
--
View this message in context:
http://lucene.472066.n3.nabble.com/Merging-Sets-of-Data-from-Two-Different-Sources-tp4211166p4211253.html
Sent from the Solr - User mailing list archive at Nabble.com.
Both sources, the filesystem and the database, contain the file path for each
individual file
--
View this message in context:
http://lucene.472066.n3.nabble.com/Merging-Sets-of-Data-from-Two-Different-Sources-tp4211166p4211251.html
Sent from the Solr - User mailing list archive at Nabble.com.
So you're saying I could merge both the metadata in the database and their
files in the file system into one query-able item in solr by just
customizing the DIH correctly and getting the right schema?
(I'm sorry this sounds like a redundant question but I've been trying to
find an answer for the
You were very VERY helpful. Thank you very much. If I could bug you for one
last question. Do you know where the documentation is that would help me
write my own indexer?
--
View this message in context:
http://lucene.472066.n3.nabble.com/Merging-Sets-of-Data-from-Two-Different-Sources-tp42111
So you're saying that Tika can parse the text OUTSIDE of Solr. So I would
still be able to process my PDF's with Tika just outside of Solr
specifically correct?
--
View this message in context:
http://lucene.472066.n3.nabble.com/Merging-Sets-of-Data-from-Two-Different-Sources-tp4211166p421117
I do have a link between both sets of data and that would be the filepath
that could be indexed by both.
I do, however, have large PDF's that do need to be indexed. So just for
clarification, I could write an indexer that used both the DIH and SolrCell
to submit a combined record to Solr or would
I'm trying to figure out if Solr is a good fit for my project.
I have two sets of data. On the one hand there is a bunch of files sitting
in a local file system in a Linux file system. On the other is a set of
metadata FOR the files that is located in a MySQL database.
I need a program that can
46 matches
Mail list logo