Hi Juan
I tried with the following code first:
final SolrQuery allDocumentsQuery = new SolrQuery();
allDocumentsQuery.setQuery(id: + myId);
allDocumentsQuery.setFields(*);
allDocumentsQuery.setRows(1);
QueryResponse response = solr.query(allDocumentsQuery, METHOD.POST);
With this, only
Solr 3.3. has a feature Grouping. Is it practically same as deduplication?
Here is my use case for duplicates removal -
We have many documents with similar (upto 99%) content. Upon some search
queries, almost all of them come up on first page results. Of all these
documents, essentially one is
Thanks Shawn.
If Solr writes this info to Disk as soon as possible (which is what I am
seeing) then ramBuffer setting seems to be misleading.
Anyone else has any thoughts on this?
-Saroj
On Mon, Aug 29, 2011 at 6:14 AM, Shawn Heisey s...@elyograg.org wrote:
On 8/28/2011 11:18 PM, roz dev
Hi all,
Currently I'm testing Solr's indexing performance, but unfortunately I'm
running into memory problems.
It looks like Solr is not closing the filestream after an exception, but I'm
not really sure.
The current system I'm using has 150GB of memory and while I'm indexing the
Deduplication uses lucene indexWriter.updateDocument using the signature
term. I don't think it's possible as a default feature to choose wich
document to index, the original should be always the last to be indexed.
/IndexWriter.updateDocument
Updates a document by first deleting the document(s)
For phrase queries, you simply surround the text with
double quotes e.g. this is a phrase...
Best
Erick
2011/8/29 Rode González r...@libnova.es:
Hi again.
In that case, you should be able to use a tokeniser to split
the input into phrases, though you will probably need to write
a custom
Why doesn't the singleton approach we talked about a few
days ago do this? It would create the object the first
time you asked for it, and return you the same one thereafter
Best
Erick
On Mon, Aug 29, 2011 at 11:04 AM, samuele.mattiuzzo samum...@gmail.com wrote:
it's how i'm doing it now...
Why couldn't you just give an outrageous distance (10) or something?
You have to have some kind of point you're asking for the distance
*from*, don't you?
Best
Erick
On Mon, Aug 29, 2011 at 5:09 PM, solrnovice manisha...@yahoo.com wrote:
Eric, thanks for the update, I thought solr 4.0
Will traffic be served with a non warmed index searcher at any
point
No. That's what auto-warming is all about.
More correctly, it depends on how you configure things in
your config file. There are entries like firstSearcher,
newSearcher and various autowarm counts, all of which
you set and the
I'd really think carefully before disabling unique IDs. If you do,
you'll have to manage the records yourself, so your next
delta-import will add more records to your search result, even
those that have been updated.
You might do something like make the uniqueKey the
concatenation of productid
Hi all,
Currently I'm testing Solr's indexing performance, but unfortunately I'm
running into memory problems.
It looks like Solr is not closing the filestream after an exception, but I'm
not really sure.
The current system I'm using has 150GB of memory and while I'm indexing the
Hi Eric,
Fields are lazy loading, content stored in solr and machine 32 gig.. solr
has 20 gig heap. There is no swapping.
As you see we have many phrases in the same query . I couldnt find a way to
drop qtime to subsecends. Suprisingly non shingled test better qtime !
On Mon, Aug 29, 2011 at
Satish Talim, il 30/08/2011 05:42, ha scritto:
[...]
Is there a work-around wherein I can send an OpenBitSet object?
JavaBinCodec (used by default by solr) supports writing arrays. you can
getBits() from openbitset and throw them into the binary response
federico
Satish Talim, il 30/08/2011 05:42, ha scritto:
[...]
Is there a work-around wherein I can send an OpenBitSet object?
JavaBinCodec (used by default by solr) supports writing arrays. you can
getBits() from openbitset and throw them into the binary response
federico
my problem is i still don't understand where i have to put that singleton (or
how i can load it into solr)
i have my singleton class Connector for mysql, with all its methods defined.
Now what? This is the point i'm missing :(
--
View this message in context:
But how to throw? As a stream of bits?
Satish
On Tue, Aug 30, 2011 at 5:39 PM, Federico Fissore feder...@fissore.orgwrote:
Satish Talim, il 30/08/2011 05:42, ha scritto:
[...]
Is there a work-around wherein I can send an OpenBitSet object?
JavaBinCodec (used by default by solr) supports
Satish Talim, il 30/08/2011 14:22, ha scritto:
But how to throw? As a stream of bits?
getBits() return a long[]
add a long[] part to your response
rb.rsp.add(long_array, obs.getBits())
federico
I have a string fieldtype defined as so
fieldType name=string class=solr.StrField sortMissingLast=true
omitNorms=true/
And I have a field defined as
field name=guid type=string indexed=true stored=true required=false
/
The fields are of this format
92E8EF8FC9F362BBE0408CA5785A29D4
But in
This might work in conjunction with what POST processing to help to
pair down the results, but the logic for the actual access to the data
is too complex to have entirely in solr.
On Mon, Aug 29, 2011 at 2:02 PM, Erick Erickson erickerick...@gmail.com wrote:
It's reasonable, but post-filtering
On 8/30/2011 12:57 AM, roz dev wrote:
Thanks Shawn.
If Solr writes this info to Disk as soon as possible (which is what I am
seeing) then ramBuffer setting seems to be misleading.
Anyone else has any thoughts on this?
The stored fields are only two of the eleven Lucene files in each
Hi All
I am trying to use FieldCollapsing feature in Solr. On the Solr admin
interface, I give ...group=truegroup.field=fieldA and I can see grouped
results.
But, I am not able to figure out how to read those results in that order on
java.
Something like: SolrDocumentList doclist =
Using the DirectSolrSpellChecker im very interested in this.
According to https://issues.apache.org/jira/browse/SOLR-2585 some changes
need to be made to DirectSolrSpellChecker.
Does anybody know how to get this working?
--
View this message in context:
Hi All:
I have a few fields that are of the form: A:2B or G:U2 and so on. I
would like to be able to search the field using a wild character search like:
A:2*
or G:U*. I have tried out modifying the field_type definitions to allow for
such queries but without any luck
Could
hi Eric, thank you for the tip, i will try that option. Where can i find a
document that shows details of geodist arguments, when i google, i did not
find one.
so this is what my query is like. I want the distance to be returned. i dont
now exactly what all to pass to geodist, as i couldnt find a
I had the same problem with a database here, and we discovered that every
item had its own product page, its own url. So, we decided that our unique
id had to be the url instead of using sql ids and id concatenations.
sometimes it works. You can store all ids if u need them for something, but
for
What version of Solr are you using, and how are you indexing?
DIH? SolrJ?
I'm guessing you're using Tika, but how?
Best
Erick
On Tue, Aug 30, 2011 at 4:55 AM, Marc Jacobs jacob...@gmail.com wrote:
Hi all,
Currently I'm testing Solr's indexing performance, but unfortunately I'm
running into
Can we see the output if you specify both
debugQuery=ondebug=true
the debug=true will show the time taken up with various
components, which is sometimes surprising...
Second, we never asked the most basic question, what are
you measuring? Is this the QTime of the returned response?
(which is the
My educated guess is that you're using Java for your indexer, and you're (or
something below is) doing a toString on a Java object. You're sending over a
Java object address, not the string itself. A simple change to your indexer
should fix this.
Erik
On Aug 30, 2011, at 08:42 ,
OK, maybe I'm getting there. You put it into a .jar file, and
then in solrconfig.xml you create a lib... directive that points
to where the jar file is. At that point, you can add your custom
class to the UpdateRequestProcessor as per Tomas' e-mail.
Best
Erick
On Tue, Aug 30, 2011 at 8:10 AM,
thanks Tommaso,
there is some problem in my solrconfig.xml.
now its fixed.
thanks again.
--
View this message in context:
http://lucene.472066.n3.nabble.com/solr-UIMA-exception-tp3285158p3295743.html
Sent from the Solr - User mailing list archive at Nabble.com.
Have you looked at the XML (or JSON) response format?
You're right, it is different and you have to parse it
differently, there are move levels. Try this query
and you'll see the format (default data set).
http://localhost:8983/solr/select?q=*:*group=ongroup.field=manu_exact
Best
Erick
On Tue,
There is very little information to go on here, but at a
guess WordDelimiterFilterFactory is your problem.
have you looked at the admin/analysis page to try to figure
out what your analysis chain is doing?
Best
Erick
On Tue, Aug 30, 2011 at 9:46 AM, ramdev.wud...@thomsonreuters.com wrote:
Hi
q=*:*sfield=storept=45.15,-93.85fl=name,store,geodist()
Actually, you don't even have to specify the d=, I misunderstood.
Best
Erick
On Tue, Aug 30, 2011 at 9:56 AM, solrnovice manisha...@yahoo.com wrote:
hi Eric, thank you for the tip, i will try that option. Where can i find a
document that
Have you tried the debug page? See:
http://wiki.apache.org/solr/DataImportHandler#interactive
Best
Erick
On Tue, Aug 30, 2011 at 12:44 AM, vighnesh svighnesh...@gmail.com wrote:
hi all
I am facing the problem in get a update record from database using delta
query in solr please give me the
Hmmm, I'm using DIH defined in data-config.xml
I have an Oracle data source configured using JDBC connect string.
On 8/30/11 10:41 AM, Erik Hatcher erik.hatc...@gmail.com wrote:
My educated guess is that you're using Java for your indexer, and you're
(or something below is) doing a toString
I was curious to know if anyone has any information about the relative
performance of document updates (delete/add operations) on documents
of different sizes. I have a use case in which I can either create
large Solr documents first and subsequently add a small amount of
information to them, or
Hi Erick
Yes, I did see the XML format. But, I did not understand how to read the
response using SolrJ.
I found some information about Collapse Component on googling, which looks
like a normal Solr XML results format.
http://blog.jteam.nl/2009/10/20/result-grouping-field-collapsing-with-solr/
Document size should not have any impact on deleting document as they are only
marked for deletion.
On Tuesday 30 August 2011 17:06:05 Jeff Leedy wrote:
I was curious to know if anyone has any information about the relative
performance of document updates (delete/add operations) on documents
ok so my two singleton classes are MysqlConnector and JFJPConnector
basically:
1 - jar them
2 - cp them to /custom/path/within/solr/
3 - modify solrconfig.xml with lib/custom/path/within/solr//lib
my two jars are then automatically loaded? nice!
in my CustomUpdateProcessor class i can call
Ahhh, see: https://issues.apache.org/jira/browse/SOLR-2637
Short form: It's in 3.4, not 3.3.
So, your choices are:
1 parse the XML yourself
2 get a current 3x build (as in one of the nightlys) and use SolrJ there.
Best
Erick
On Tue, Aug 30, 2011 at 11:09 AM, Sowmya V.B. vbsow...@gmail.com
Right, you're on track. Note that the changes you
make to solrconfig.xml require you to give the
qualified class name (e.g. org.myproj.myclass), but
it all just gets found man.
Also, it's not even necessary to be at a custom path within
Solr, although it does have to be *relative* to SOLR_HOME.
i think it's better for me to keep it under some solr installation path, i
don't want to loose files :)
ok, i'm going to try this out :) i already got into the package issue
(my.package.whatever) this one i know how to handle!
thanks for all the help, i'll post again to tell you It Works! (but
Hi,
I have a machine (win 2008R2) with 16GB RAM, I am having issue indexing 1/2GB
files. How do we avoid creating a SOLRInputDocument or is there any way to
directly use Lucene Index writer classes.
What would be the best approach. We need some suggestions.
Thanks,
Tirthankar
Eric, thank you for the quick update, so in the below query you sent to me, i
can also add any conditions right? i mean city:Boston and state:MA...etc ,
can i also use dismax query syntax?
The confusion from the beginning seems to be the version of solr i was
trying and the one you are trying.
Hi
I've read that it's possible add documents to slave machine:
http://wiki.apache.org/solr/SolrReplication#What_if_I_add_documents_to_the_slave_or_if_slave_index_gets_corrupted.3F
¿Is there anyway to not allow add to documents to slave machine? for
example, touch on configurations files
Ok. Figured it out. Thanks for the pointer. The field was of type RAW
in Oracle so it was being converted to a java string by DIH with the
behaviour below.
I just changed the SQL query in DIH to add RAWTOHEX(guid)
On 8/30/11 11:03 AM, Twomey, David david.two...@novartis.com wrote:
Hmmm,
: We have a need to query and fetch millions of document ids from a Solr 3.3
: index and convert the same to a BitSet. To speed things up, we want to
: convert these document ids into OpenBitSet on the server side, put them into
: the response object and read the same on the client side.
This
Tomás Fernández Löbbe, il 29/08/2011 20:32, ha scritto:
You can use reflection to instantiate the correct object (specify the class
name on the parameter on the solrconfig and then invoke the constructor via
reflection). You'll have to manage the life-cycle of your object yourself.
If I
Below the output of the debug. I am measuring pure solr qtime which show in
the Qtime field in solr xml.
arr name=parsed_filter_queries
strmrank:[0 TO 100]/str
/arr
lst name=timing
double name=time8584.0/double
lst name=prepare
double name=time12.0/double
lst
That's basically it.
remove all /update URLs from the slave config
On Tue, Aug 30, 2011 at 8:34 AM, Miguel Valencia
miguel.valen...@juntadeandalucia.es wrote:
Hi
I've read that it's possible add documents to slave machine:
http://wiki.apache.org/solr/**SolrReplication#What_if_I_add_**
what issues exactly ?
are you using 32 bit Java ? That will restrict the JVM heap size to 2GB max.
-Simon
On Tue, Aug 30, 2011 at 11:26 AM, Tirthankar Chatterjee
tchatter...@commvault.com wrote:
Hi,
I have a machine (win 2008R2) with 16GB RAM, I am having issue indexing
1/2GB files. How
Eric, can you please let me know the solr build, that you are using. I went
to this below site, but i want to use the same build, you are using, so i
can make sure the queries work.
http://wiki.apache.org/solr/FrontPage#solr_development
thanks
SN
--
View this message in context:
: The current system I'm using has 150GB of memory and while I'm indexing the
: memoryconsumption is growing and growing (eventually more then 50GB).
: In the attached graph (http://postimage.org/image/acyv7kec/) I indexed about
: 70k of office-documents (pdf,doc,xls etc) and between 1 and 2
Hi Erick,
I am using Solr 3.3.0, but with 1.4.1 the same problems.
The connector is a homemade program in the C# programming language and is
posting via http remote streaming (i.e.
http://localhost:8080/solr/update/extract?stream.file=/path/to/file.docliteral.id=1
)
I'm using Tika to extract the
: Ok. Figured it out. Thanks for the pointer. The field was of type RAW
: in Oracle so it was being converted to a java string by DIH with the
: behaviour below.
RAW is probably very similar to BLOB...
Thanks Everyone for the responses.
Yes, the way Eric described would work for trivial debugging but when i
actually need to debug something in production this would be a big hassle
;-)
For now I am going to mark the field to be stored=true to get around this
problem. We are migrating away from
I think i found the link to the nightly build, i am going to try this flavor
of solr and run the query and check what happens.
The link i am using is
https://builds.apache.org/job/Solr-trunk/lastSuccessfulBuild/artifact/artifacts/
thanks
SN
--
View this message in context:
if you remove the single quotes from your query syntax it should work.
in general using multivalued fields where you want to coordinate matches
based on the position in the multivalued field (ie: a multivalued list of
author first names and a multivalued lsit of author lastnames and you want
For indexing the webpages, you can use Nutch with Solr, which would do
the scarping and indexing of the page.
For finding similar documents/pages you can use
http://wiki.apache.org/solr/MoreLikeThis, by querying the above
document (by id or search terms) and it would return similar documents
from
Anyone else strugglin' with dismax's MM parameter?
We're having a problem here, seems that configs from 3 terms and more are
being ignored by solr and it assumes previous configs.
if I use str name=mm3lt;1/str or str name=mm3lt;100%/str i get
the same results for a 3-term query.
If i try str
Hi Jayendra,
Thank you for the reply. I figured it out finally. I had to configure my web
servelet container Jetty for this..Now it works:-)
-
Sheetal
--
View this message in context:
http://lucene.472066.n3.nabble.com/Search-the-contents-of-given-URL-in-Solr-tp3294376p3296487.html
Sent
you might want to check - http://wiki.apache.org/solr/TermVectorComponent
Should provide you with the term vectors with a lot of additional info.
Regards,
Jayendra
On Tue, Aug 30, 2011 at 3:34 AM, Gabriele Kahlout
gabri...@mysimpatico.com wrote:
Hello,
This time I'm trying to duplicate Luke's
Hi Chris,
Thanks for the response.
Eventualy I want to install Solr on a machine with a maximum memory of 4GB.
I tried to index the data on that machine before, but it resulted in index
locks and memory errors.
Is 4GB not enough to index 100,000 documents in a row? How much should it
be? Is there
i thinki i have to drop the singleton class solution, since my boss wants to
add 2 other different solr installation and i need to reuse the plugins i'm
working on... so i'll have to use a connectionpool or i will create hangs
when the 3 cores update their indexes at the same time :(
--
View this
I am trying
to peek into the index to see if my index-time synonym expansions are
working properly or not.
For this I have successfully used the analysis page of the admin application
that comes out of the box. Works really well for debugging schema changes.
JRJ
-Original Message-
Another way that occurs to me is that if you have a securityconstraint on the
update URL(s) in your web.xml, you can map them to no groups / empty groups in
the JEE container.
JRJ
-Original Message-
From: simon [mailto:mtnes...@gmail.com]
Sent: Tuesday, August 30, 2011 12:21 PM
To:
Also... Did he restart either his web app server container or at least the
Solr servlet inside the container?
JRJ
-Original Message-
From: Erik Hatcher [mailto:erik.hatc...@gmail.com]
Sent: Friday, August 26, 2011 5:29 AM
To: solr-user@lucene.apache.org
Subject: Re: missing field in
Hmmm I believe I discovered the problem.
When you have something like this:
250% 6-60%
you should read it from right to left and use the word MORE.
MORE THAN SIX clauses, 60% are optional, MORE THAN TWO clauses (and that
includes 3, 4 and 5 AND 6) half is mandatory.
if you wanna a special
See solrconfig.xml, particularly ramBufferSizeMB,
also maxBufferedDocs.
There's no reason you can't index as many documents
as you want, unless your documents are absolutely
huge (as in 100s of M, possibly G size).
Are you actually getting out of memory problems?
Erick
On Tue, Aug 30, 2011 at
Well, your singleton can be the connection
pool manager..
Best
Erick
On Tue, Aug 30, 2011 at 4:45 PM, samuele.mattiuzzo samum...@gmail.com wrote:
i thinki i have to drop the singleton class solution, since my boss wants to
add 2 other different solr installation and i need to reuse the
Yep, that one takes a while to figure out, then
I wind up re-figuring it out every time I have
to change it G...
Best
Erick
On Tue, Aug 30, 2011 at 6:36 PM, Alexei Martchenko
ale...@superdownloads.com.br wrote:
Hmmm I believe I discovered the problem.
When you have something like this:
250%
OK, I'll have to defer because this makes no sense.
4+ seconds in the debug component?
Sorry I can't be more help here, but nothing really
jumps out.
Erick
On Tue, Aug 30, 2011 at 12:45 PM, Lord Khan Han khanuniver...@gmail.com wrote:
Below the output of the debug. I am measuring pure solr
That should be fine. I'm not actually sure what version of Trunk I
have, I update it sporadically and build from scratch. But the last
successful build artifacts will certainly have the pseudo-field
return of function in it, so you should be fine.
Best
Erick
On Tue, Aug 30, 2011 at 2:33 PM,
hi Erik, today i had the distance working. Since the solr version under
LucidImagination is not returning geodist(), I downloaded Solr 4.0 from the
nightly build. On lucid we had the full schema defined. So i copied that
schema to the example directory of solr-4 and removed all references to
I am using solr1.3,I updated solr index throgh solr delta import every two
hours. but the delta import is database connection wasteful.
So i want to use full-import with entity name instead of delta import.
my db-data-config.xml file:
entity name=article pk=Article_ID query=select
Lucid also has an online forum for questions about the LucidWorksEnterprise
product:
http://www.lucidimagination.com/forum/lwe
The Lucidi Imagination engineers all read the forum and endeavor to quickly
answer questions like this.
On Tue, Aug 30, 2011 at 6:09 PM, solrnovice manisha...@yahoo.com
So I looked at doing this, but I don't see a way to get the scores
from the docs as well. Am I missing something in that regards?
On Mon, Aug 29, 2011 at 8:53 PM, Jamie Johnson jej2...@gmail.com wrote:
Thanks Hoss. I am actually ok with that, I think something like
50,000 results from each
I was not referring to Lucene's doc ids but the doc numbers (unique key)
Satish
On Tue, Aug 30, 2011 at 9:28 PM, Chris Hostetter
hossman_luc...@fucit.orgwrote:
: We have a need to query and fetch millions of document ids from a Solr
3.3
: index and convert the same to a BitSet. To speed
Hi all:
I'm using SolrClound for distributed search. Everything comes to well but
there is a small problem.Each node have searched quickly and process the data
to top request,I found a request like
q=solrids=id1,id2,id3,id4,id5,id6...,id10.
Solr handles this request with a 'for'
hi Lance, thanks for the link, i went to their site, lucidimagination forum,
when i searched on geodist, i see my own posts. Is this forum part of
lucidimagination?
Just curious.
thanks
SN
--
View this message in context:
http://lucene.472066.n3.nabble.com/Solr-Geodist-tp3287005p3297262.html
Found score, so this works for regular queries but now I'm getting an
exception when faceting.
SEVERE: Exception during facet.field of type:java.lang.NullPointerException
at
org.apache.solr.request.SimpleFacets.getFieldCacheCounts(SimpleFacets.java:451)
at
Hello,
What is the best way to remove duplicate values on output. I am using the
following query:
/solr/select/?q=wrt54g2version=2.2start=0rows=10indent=on*fl=productid*
And I get the following results:
doc
int name=productid1011630553/int
/doc
doc
int name=productid1011630553/int
/doc
docint
The Term Vector Component (TVC) is a SearchComponent designed to return
information about documents that is stored when setting the termVector
attribute on a field:
Will I have to re-index after adding that to the schema?
On Tue, Aug 30, 2011 at 11:06 PM, Jayendra Patil
I have 1000's of cores and to reduce the cost of loading unloading
schema.xml, I have my solr.xml as mentioned here -
http://wiki.apache.org/solr/CoreAdmin
namely:
solr
cores adminPath=/admin/cores shareSchema=true
...
/cores
/solr
However, I am not sure where to keep the common
84 matches
Mail list logo