Re: defragmentation can improve performance on SATA class 10 disk ~10000 rpm ?

2021-02-22 Thread Dario Rigolin
Hi Danilo, following my experience now SSD or RAM Disk is the only way to
speed up queries. It depends on your storage occupation of your 41M docs.
If you don't have Enterprise SSD you can add consumer SSD as a fast cache
(linux caching modules "flashcache / bcache" are able to use cheap SSD as a
data cache and have your data safe stored on SATA Disks).

I don't think you can increase performances without changing technology on
the storage system.

Regards.
Dario

Il giorno lun 22 feb 2021 alle ore 08:52 Danilo Tomasoni 
ha scritto:

> Hello all,
> we are running a solr instance with around 41 MLN documents on a SATA
> class 10 disk with around 10.000 rpm.
> We are experiencing very slow query responses (in the order of hours..)
> with an average of 205 segments.
> We made a test with a normal pc and an SSD disk, and there the same solr
> instance with the same data and the same number of segments was around 45
> times faster.
> Force optimize was also tried to improve the performances, but it was very
> slow, so we abandoned it.
>
> Since we still don't have enterprise server ssd disks, we are now
> wondering if in the meanwhile defragmenting the solrdata folder can help.
> The idea is that due to many updates, each segment file is fragmented
> across different phisical blocks.
> Put in another way, each segment file is non-contiguous on disk, and this
> can slow-down the solr response.
>
> What do you suggest?
> Is this somewhat equivalent to force-optimize or it can be faster?
>
> Thank you.
> Danilo
>
> Danilo Tomasoni
>
> Fondazione The Microsoft Research - University of Trento Centre for
> Computational and Systems Biology (COSBI)
> Piazza Manifattura 1,  38068 Rovereto (TN), Italy
> tomas...@cosbi.eu<
> https://webmail.cosbi.eu/owa/redir.aspx?C=VNXi3_8-qSZTBi-FPvMwmwSB3IhCOjY8nuCBIfcNIs_5SgD-zNPWCA..=mailto%3acalabro%40cosbi.eu
> >
> http://www.cosbi.eu<
> https://webmail.cosbi.eu/owa/redir.aspx?C=CkilyF54_imtLHzZqF1gCGvmYXjsnf4bzGynd8OXm__5SgD-zNPWCA..=http%3a%2f%2fwww.cosbi.eu%2f
> >
>
> As for the European General Data Protection Regulation 2016/679 on the
> protection of natural persons with regard to the processing of personal
> data, we inform you that all the data we possess are object of treatment in
> the respect of the normative provided for by the cited GDPR.
> It is your right to be informed on which of your data are used and how;
> you may ask for their correction, cancellation or you may oppose to their
> use by written request sent by recorded delivery to The Microsoft Research
> – University of Trento Centre for Computational and Systems Biology Scarl,
> Piazza Manifattura 1, 38068 Rovereto (TN), Italy.
> P Please don't print this e-mail unless you really need to
>


-- 

Dario Rigolin
Comperio srl - CTO
Mobile: +39 347 7232652 - Office: +39 0425 471482
Skype: dario.rigolin


Re: Almost nodes in Solrcloud dead suddently

2020-06-23 Thread Dario Rigolin
 server at: null
>
> Caused by: org.apache.solr.client.solrj.SolrServerException: IOException
> occured when talking to server at: null
>
> Caused by: java.nio.channels.ClosedChannelException
>
>
>
> Server 6:
>
>  + org.apache.solr.client.solrj.SolrServerException: Timeout occured while
> waiting response from server at:
> http://server6:8983/solr/mycollection_shard2_replica_n15/select
>
>  + + org.apache.solr.client.solrj.SolrServerException: Timeout occured
> while waiting response from server at: Timeout occured while waiting
> response from server at:
> http://server4:8983/mycollection_shard6_replica_n23/select
>
>
>
> I tried search google but didn't find any clue  :(! Do you help me how to
> find the cause. thank you!
>
>
>
>
>

-- 

Dario Rigolin
Comperio srl - CTO
Mobile: +39 347 7232652 - Office: +39 0425 471482
Skype: dario.rigolin


Re: Cause of java.io.IOException: No space left on device Error

2020-04-23 Thread Dario Rigolin
When solr starts an optimization of the index you have to have free at
least same size (I don't know if 3 times is correct) of the core you are
optimizing.
Maybe your free space isn't enough to handle the optimization process.
Sometimes you have to restart the Solr process to have released more space
on the filesystem, couple of time solr didn't release all space.



Il giorno gio 23 apr 2020 alle ore 10:23 Kayak28 
ha scritto:

> Hello, Community:
>
> I am currently using Solr 5.3.1. on CentOS.
> The other day, I faced an error message that shows
> " java.io.IOException: No space left on device"
>
> My disk for Solr has empty space about 35GB
> and the total amount of the disk is 581GB.
>
> I doubted there was no enough space for Linux inode,
> but inode still has spaces. (IUse was 1% )
>
> One of my Solr cores has 47GB of indexes.
>
> Is there a possibility that the error happens when I do forceMerge on the
> big core
> (which I believe optimize needs temporarily 3 times spaces of index-size)?
> Or is there any other possibility to cause the error?
>
>
> Any Clues are very helpful.
>
> --
>
> Sincerely,
> Kaya
> github: https://github.com/28kayak
>


-- 

Dario Rigolin
Comperio srl - CTO
Mobile: +39 347 7232652 - Office: +39 0425 471482
Skype: dario.rigolin


Re: Time out problems with the Solr server 8.4.1

2020-02-27 Thread Dario Rigolin
I this the issue is NFS. If you mode all to a NVMe or SSD local to the
server indexing process will work fine.
NFS is the wrong filesystem for solr.

I hope this helps.

Il giorno gio 27 feb 2020 alle ore 00:03 Massimiliano Randazzo <
massimiliano.randa...@gmail.com> ha scritto:

> Il giorno mer 26 feb 2020 alle ore 23:42 Vincenzo D'Amore <
> v.dam...@gmail.com> ha scritto:
>
> > Hi Massimiliano,
> >
> > it’s not clear how much memory you have configured for your Solr
> instance.
> >
>
> SOLR_HEAP="20480m"
> SOLR_JAVA_MEM="-Xms20480m -Xmx20480m"
> GC_LOG_OPTS="-verbose:gc -XX:+PrintHeapAtGC -XX:+PrintGCDetails \
>   -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps
> -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime"
>
> > And I would avoid an nfs mount for the datadir.
> >
> > Ciao,
> > Vincenzo
> >
> > --
> > mobile: 3498513251
> > skype: free.dev
> >
> > > On 26 Feb 2020, at 19:44, Massimiliano Randazzo <
> > massimiliano.randa...@gmail.com> wrote:
> > >
> > > Il giorno mer 26 feb 2020 alle ore 19:30 Dario Rigolin <
> > > dario.rigo...@comperio.it> ha scritto:
> > >
> > >> You can avoid commit and leave solr do autocommit at certain times.
> > >> Or use softcommit if you have search queries at the same time to
> answer.
> > >> 55 pages of 3500 words isn't a big deal for a solr server, what's
> > the
> > >> hardware configuration?
> > > The solr instance runs on a server with the following configuration:
> > > 12 core Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz
> > > 64GB Ram
> > > solr's DataDir is on a volume of another server that I mounted via NFS
> (I
> > > was thinking of moving the solr server to the server where the DataDir
> > > resides even if it has lower characteristics 8 core Intel(R) Xeon(R)
> CPU
> > >   E5506  @ 2.13GHz 24GB Ram)
> > >
> > > What's you single solr document a single newspaper? a single page?
> > >
> > > the single solr document refers to the single word of the document
> > >
> > >
> > >> Do you have a solrcloud with 8 nodes? Or are you sending same document
> > to 8
> > >> single solr servers?
> > >> I have 8 servers that process 550,000 newspapers and all of them write
> > on
> > > 1 solr server only
> > >
> > >
> > >>> Il giorno mer 26 feb 2020 alle ore 19:22 Massimiliano Randazzo <
> > >>> massimiliano.randa...@gmail.com> ha scritto:
> > >>> Good morning
> > >>> I have the following situation I have to index the OCR of about
> 550,000
> > >>> pages of newspapers counting an average of 3,500 words per page and
> > >> making
> > >>> a document per word the records are many.
> > >>> At the moment I have 1 instance of Solr and 8 servers that read and
> > write
> > >>> all on the same instance at the same time, at the beginning
> everything
> > is
> > >>> fine after a while when I add, delete or commit it gives me a TimeOut
> > >> error
> > >>> towards the solr server.
> > >>> I suspect the problem is due to the fact that it is that I do many
> > commit
> > >>> operations of many docs at a time (practically if the newspaper is 30
> > >> pages
> > >>> I do 105,000 add and in the end I commit), if everyone does this and
> 8
> > >>> servers within walking distance of each other I think this creates
> > >> problems
> > >>> for Solr.
> > >>> What can I do to solve the problem?
> > >>> Do I make a commi to each add?
> > >>> Is it possible to configure the solr server to apply the add and
> delete
> > >>> commands, and to commit it, the server autonomously supports the
> > >> available
> > >>> resources as it seems to do for the optmized command?
> > >>> Reading the documentation I would have found this configuration to
> > >>> implement but not if it solves my problem
> > >>> 
> > >>> 1
> > >>> 0
> > >>>  > >>
> >
> name="maxCommitAge">1DAYfalse
> > >>> Thanks for your consideration
> > >>> Massimiliano Randazzo
> > >> --
> > >> Dario Rigolin
> > >> Comperio srl - CTO
> > >> Mobile: +39 347 7232652 - Office: +39 0425 471482
> > >> Skype: dario.rigolin
> > >
> > >
> > > --
> > > Massimiliano Randazzo
> > >
> > > Analista Programmatore,
> > > Sistemista Senior
> > > Mobile +39 335 6488039
> > > email: massimiliano.randa...@gmail.com
> > > pec: massimiliano.randa...@pec.net
> >
>
>
> --
> Massimiliano Randazzo
>
> Analista Programmatore,
> Sistemista Senior
> Mobile +39 335 6488039
> email: massimiliano.randa...@gmail.com
> pec: massimiliano.randa...@pec.net
>


-- 

Dario Rigolin
Comperio srl - CTO
Mobile: +39 347 7232652 - Office: +39 0425 471482
Skype: dario.rigolin


Re: Time out problems with the Solr server 8.4.1

2020-02-26 Thread Dario Rigolin
You can avoid commit and leave solr do autocommit at certain times.
Or use softcommit if you have search queries at the same time to answer.
55 pages of 3500 words isn't a big deal for a solr server, what's the
hardware configuration?
What's you single solr document a single newspaper? a single page?
Do you have a solrcloud with 8 nodes? Or are you sending same document to 8
single solr servers?

Il giorno mer 26 feb 2020 alle ore 19:22 Massimiliano Randazzo <
massimiliano.randa...@gmail.com> ha scritto:

> Good morning
>
> I have the following situation I have to index the OCR of about 550,000
> pages of newspapers counting an average of 3,500 words per page and making
> a document per word the records are many.
>
> At the moment I have 1 instance of Solr and 8 servers that read and write
> all on the same instance at the same time, at the beginning everything is
> fine after a while when I add, delete or commit it gives me a TimeOut error
> towards the solr server.
>
> I suspect the problem is due to the fact that it is that I do many commit
> operations of many docs at a time (practically if the newspaper is 30 pages
> I do 105,000 add and in the end I commit), if everyone does this and 8
> servers within walking distance of each other I think this creates problems
> for Solr.
>
> What can I do to solve the problem?
> Do I make a commi to each add?
> Is it possible to configure the solr server to apply the add and delete
> commands, and to commit it, the server autonomously supports the available
> resources as it seems to do for the optmized command?
> Reading the documentation I would have found this configuration to
> implement but not if it solves my problem
>
> 
>   1
>   0
>name="maxCommitAge">1DAYfalse
>
>
>
> Thanks for your consideration
> Massimiliano Randazzo
>


-- 

Dario Rigolin
Comperio srl - CTO
Mobile: +39 347 7232652 - Office: +39 0425 471482
Skype: dario.rigolin


Re: Optimize solr 8.4.1

2020-02-26 Thread Dario Rigolin
Hi Massimiliano, the only way to reindex is to resend all documents to the
indexer of the Cloud instance.
At the moment solr doesn't have the ability to do it when the schema is
changed or to "send" indexed data to a SolrCloud from a non cloud .

For example we have in solr a field with an only stored field with the
original document and we use this data as a source of a new reindex.

Regards.


Il giorno mer 26 feb 2020 alle ore 14:37 Massimiliano Randazzo <
massimiliano.randa...@gmail.com> ha scritto:

> Hi Paras,
>
> thank you for your answer if you don't mind I would have a couple of
> questions
>
> I am experiencing very long indexing times I have 8 servers for currently
> working on 1 instance of Solr, I thought of moving to a cloud of 4 solr
> servers with 3 zookeeeper servers to distribute the load but I was
> wondering if I had to start over with the indexing or if there was a tool
> to load the index of a Solr into a SolrCloud by redistributing the load?
>
> Currently in the "managed-schema" file I have configured the fields to be
> indexed type="text_it" to which "lang/stopwords_it.txt" is assigned they
> ask me to remove the stopwords, if I modify the "managed-schema" file I
> remove the stopwords file Is it possible to re-index the database without
> having to reload all the material but taking the documents already present?
>
> Thank you
> Massimiliano Randazzo
>
> Il giorno mer 26 feb 2020 alle ore 13:26 Paras Lehana <
> paras.leh...@indiamart.com> ha scritto:
>
> > Hi Massimiliano,
> >
> > Is it still necessary to run the Optimize command from my application
> when
> > > I have finished indexing?
> >
> >
> > I guess you can stop worrying about optimizations and let Solr handle
> that
> > implicitly. There's nothing so bad about having more segments.
> >
> > On Wed, 26 Feb 2020 at 16:02, Massimiliano Randazzo <
> > massimiliano.randa...@gmail.com> wrote:
> >
> > > > Good morning,
> > > >
> > > > recently I went from version 6.4 to version 8.4.1, I access solerre
> > > > through java applications written by me to which I have updated the
> > > > solr-solrj-8.4.1.jar libraries.
> > > >
> > > > I am performing the OCR indexing of a newspaper of about 550,000
> pages
> > in
> > > > production for which I have calculated at least 1,000,000,000 words
> > and I
> > > > am experiencing slowness I wanted to know if you could advise me on
> > > changes
> > > > to the configuration.
> > > >
> > > > The server I'm using is a server with 12 cores and 64GB of Ram, the
> > only
> > > > changes I made in the configuration are:
> > > > Solr.in.sh <http://solr.in.sh/> file
> > > > SOLR_HEAP="20480m"
> > > > SOLR_JAVA_MEM="-Xms20480m -Xmx20480m"
> > > > GC_LOG_OPTS="-verbose:gc -XX:+PrintHeapAtGC -XX:+PrintGCDetails \
> > > >   -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps
> > > > -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime"
> > > > The Java version I use is
> > > > java version "1.8.0_51"
> > > > Java(TM) SE Runtime Environment (build 1.8.0_51-b16)
> > > > Java HotSpot(TM) 64-Bit Server VM (build 25.51-b03, mixed mode)
> > > >
> > > > Also comparing the solr web interface I noticed a difference in the
> > > > "Overview" page in solr 6.4 it was affected Optimized and Current and
> > > > allowed me to launch Optimized if necessary, in version 8.41
> Optimized
> > is
> > > > no longer present I hypothesized that this activity is done with the
> > > commit
> > > > or through some operation in the backgroup, if this were so, is it
> > still
> > > > necessary to run the Optimize command from my application when I have
> > > > finished indexing? I noticed that the Optimized function requires
> > > > considerable time and resources especially in large databases
> > > >
> > > > Thank you for your attention
> > >
> > > Massimiliano Randazzo
> > >
> > > >
> > > >
> > >
> >
> >
> > --
> > --
> > Regards,
> >
> > *Paras Lehana* [65871]
> > Development Engineer, *Auto-Suggest*,
> > IndiaMART InterMESH Ltd,
> >
> > 11th Floor, Tower 2, Assotech Business Cresterra,
> > Plot No. 22, Sector 135, Noida, Uttar Pradesh, India 201305
> >
> > Mob.: +91-9560911996
> > Work: 0120-4056700 | Extn:
> > *1196*
> >
> > --
> > *
> > *
> >
> >  <https://www.facebook.com/IndiaMART/videos/578196442936091/>
> >
>
>
> --
> Massimiliano Randazzo
>
> Analista Programmatore,
> Sistemista Senior
> Mobile +39 335 6488039
> email: massimiliano.randa...@gmail.com
> pec: massimiliano.randa...@pec.net
>


-- 

Dario Rigolin
Comperio srl - CTO
Mobile: +39 347 7232652 - Office: +39 0425 471482
Skype: dario.rigolin


Re: Solr index

2019-08-08 Thread Dario Rigolin
Do you know that your solr is open to the internet? It's better to filter
the port or at least not put here the full address...

Il giorno gio 8 ago 2019 alle ore 15:58 HTMLServices.it <
i...@htmlservices.it> ha scritto:

> Hi everyone
> I installed Solr on a test server (centos 7) to get the fastest searches
> on dovecot, Solr and new for me and I think I didn't understand how it
> works perfectly.
> I installed following the official guide on the dovecot wiki:
> https://wiki2.dovecot.org/Plugins/FTS/Solr
> but I can't get it to work properly.
>
> This is my installation that I made public provisionally without a
> password:
> http://5.39.2.59:8987/solr/#/
> (I changed port because the default one was busy)
>
> I believe that the index is not created, should it be created
> automatically? or did I do something wrong?
>
> if I run one of these two commands as a guide
> curl http://5.39.2.59:8987/solr/dovecot/update?optimize=true
> curl http://5.39.2.59:8987/solr/dovecot/update?commit=true
> I get
>
> 
> 
> 
> 0 
> 2 
> 
> 
>
> this is right? have I forgotten or am I wrong?
>
> Excuse the stupid questions but I'm seeing Solr for the first time
> thank you all
>


-- 

Dario Rigolin
Comperio srl - CTO
Mobile: +39 347 7232652 - Office: +39 0425 471482
Skype: dario.rigolin


Re: ExpandComponent not expanding

2015-03-06 Thread Dario Rigolin
I did more testing following your question... And now all makes sense 
and I think that maybe a more clear explanation on documentation can help.
I was using Grouping where a group is created also if only one element 
is present, I have inferred that expanded section was showing ALL 
collapsed records not ALL collapsed record after the one returned as 
group head.

A this point ExpandComponent works well sorry for the false allarm.

Regards.

Dario

On 6/03/2015 14:26, Joel Bernstein wrote:

The expand component only displays the groups heads when it finds expanded
documents in the group. And it only expands for the current page.

Are you finding situations where there are group heads on the page, that
have child documents that are not being expanded?

Joel Bernstein
Search Engineer at Heliosearch

On Fri, Mar 6, 2015 at 7:17 AM, Dario Rigolin da...@comperio.it wrote:


I'm using Solr 4.10.1 and FieldCollapsing but when adding expand=true and
activating ExpandComponent the expanded section into result contains only
one group head and not all group heads present into the result.
I don't know if this is the intended behaviour. Using a query q=*:* the
expanded section increase the number of group heads but not all 10 heads
group are present. Also removing max= parameter on !collapse makes display
couple of more heads but not all .

Regards

Example of response with only one group head into expanded but 10 are
returned.

response
script/
lstname=responseHeader
intname=status
0
/int
intname=QTime
20
/int
lstname=params
strname=expand.rows
2
/str
strname=expand.sortsdate asc/str
strname=fl
id
/str
strname=q
title:(test search)
/str
strname=expand
true
/str
strname=fq
{!collapse field=group_key max=sdate}
/str
/lst
/lst
resultname=responsenumFound=120start=0
doc
strname=id
test:catalog:713515
/str
/doc
doc
strname=id
test:catalog:126861
/str
/doc
doc
strname=id
test:catalog:88797
/str
/doc
doc
strname=id
test:catalog:91760
/str
/doc
doc
strname=id
test:catalog:14095
/str
/doc
doc
strname=id
test:catalog:60616
/str
/doc
doc
strname=id
test:catalog:31539
/str
/doc
doc
strname=id
test:catalog:29449
/str
/doc
doc
strname=id
test:catalog:146638
/str
/doc
doc
strname=id
test:catalog:137554
/str
/doc
/result
lstname=expanded
resultname=collapse_value_2342numFound=3start=0
doc
strname=id
test:catalog:21
/str
/doc
doc
strname=id
test:catalog:330659
/str
/doc
/result
/lst
head/
/response





ExpandComponent not expanding

2015-03-06 Thread Dario Rigolin
I'm using Solr 4.10.1 and FieldCollapsing but when adding expand=true 
and activating ExpandComponent the expanded section into result contains 
only one group head and not all group heads present into the result.
I don't know if this is the intended behaviour. Using a query q=*:* the 
expanded section increase the number of group heads but not all 10 heads 
group are present. Also removing max= parameter on !collapse makes 
display couple of more heads but not all .


Regards

Example of response with only one group head into expanded but 10 are 
returned.


response
script/
lstname=responseHeader
intname=status
0
/int
intname=QTime
20
/int
lstname=params
strname=expand.rows
2
/str
strname=expand.sortsdate asc/str
strname=fl
id
/str
strname=q
title:(test search)
/str
strname=expand
true
/str
strname=fq
{!collapse field=group_key max=sdate}
/str
/lst
/lst
resultname=responsenumFound=120start=0
doc
strname=id
test:catalog:713515
/str
/doc
doc
strname=id
test:catalog:126861
/str
/doc
doc
strname=id
test:catalog:88797
/str
/doc
doc
strname=id
test:catalog:91760
/str
/doc
doc
strname=id
test:catalog:14095
/str
/doc
doc
strname=id
test:catalog:60616
/str
/doc
doc
strname=id
test:catalog:31539
/str
/doc
doc
strname=id
test:catalog:29449
/str
/doc
doc
strname=id
test:catalog:146638
/str
/doc
doc
strname=id
test:catalog:137554
/str
/doc
/result
lstname=expanded
resultname=collapse_value_2342numFound=3start=0
doc
strname=id
test:catalog:21
/str
/doc
doc
strname=id
test:catalog:330659
/str
/doc
/result
/lst
head/
/response


Converti XML response into JavaBin encoding

2013-07-26 Thread Dario Rigolin
I'm in the process to create a service gateway from a SQL database bur 
externally acting as a Solr Server. I have implemented the XML, JSON and PHPs 
Response format but using sharing I'm receiving requests for a javabin 
format. Looking into javadoc I have found a JavaBinCodec encoder receiving a 
list of java objects and preparing a Java serialized reply. My question is that 
inside Solr library an encoder from XML/JSON to JavaBin is allready available 
and in which package or class is available.
I developed my gateway in Perl and I can embeded Javacode inside to call this 
java class if any exists. 
If not I need to implement my self and any help pointing me to a existing code 
inside solr it will be appreciated (Example where solr prepare the response to 
a query).

Thank you.

-
Dario Rigolin
drigo...@gmail.com





TermComponent and Optimize

2012-05-23 Thread Dario Rigolin
We have an issue with TermComponent on Solr 3.6 (and 3.5), using term list on 
filed id (unique id of documents) we receive as reply that we have multiple 
documents with the same id!!
Doing a search only one doc is returned as expected.

After more deep investigation this issue is fixed doing an index optimize. 
After an optimize all term list on id field returns that all ids are unique.
If we update a single document this document id will be listed by termcomponent 
as used in two documents.
It seems that TermComponent is looking at all versions of documents in the 
index.

Does this is the expected behavior for TermComponent? Any suggestion about how 
to solve this?
We use TermComponent to do a smart autocomplete about values on fields.

Thank you

---
Comperio srl
Dario Rigolin 

Re: invert terms in search with exact match

2011-03-24 Thread Dario Rigolin
On Thursday, March 24, 2011 03:52:31 pm Gastone Penzo wrote:

 
 title1: my love darling
 title2: my darling love
 title3: darling my love
 title4: love my darling

Sorry but simply search for:


 title:( my OR love OR darling) 
If you have default operator OR you don't need to put OR on the query

Best regards.
Dario Rigolin
Comperio srl (Italy)


Error on string searching # [STRANGE]

2011-03-10 Thread Dario Rigolin
I have a text field indexed using WordDelimeter
Indexed in that way
doc
field name=myfieldS.#L.W.VI.37/field
...
/doc

Serching in that way:
http://192.168.3.3:8983/solr3.1/core0/select?q=myfield:(S.#L.W.VI.37)

Makes this error:

org.apache.lucene.queryParser.ParseException: Cannot parse 'myfield:(S.': 
Lexical error at line 1, column 17.  Encountered: EOF after : \S.

It seems that # is a wrong character for query... I try urlencoding o adding a 
slash before or removing quotes but other errors comes:

http://192.168.3.3:8983/solr3.1/core0/select?q=myfield:(S.#L.W.VI.37)

org.apache.lucene.queryParser.ParseException: Cannot parse 'myfield:(S.': 
Encountered EOF at line 1, column 15.
Was expecting one of:
AND ...
OR ...
NOT ...
+ ...
- ...
( ...
) ...
* ...
^ ...
QUOTED ...
TERM ...
FUZZY_SLOP ...
PREFIXTERM ...
WILDTERM ...
[ ...
{ ...
NUMBER ...


Any idea how to solve this?
Maybe a bug? Or probably I'm missing something.

Dario.


Re: Error on string searching # [STRANGE]

2011-03-10 Thread Dario Rigolin
On Thursday, March 10, 2011 04:53:51 pm Juan Grande wrote:
 I think that the problem is with the # symbol, because it has a special
 meaning when used inside a URL. Try replacing it with %23, like this:
 http://192.168.3.3:8983/solr3.1/core0/select?q=myfield:(S.%23L.W.VI.37)

If I do urlencoding and changing in %23 I get this error


3

java.lang.ArrayIndexOutOfBoundsException: 3
at 
org.apache.lucene.search.MultiPhraseQuery$MultiPhraseWeight.scorer(MultiPhraseQuery.java:185)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:208)
at org.apache.lucene.search.Searcher.search(Searcher.java:88)

 
 Regards,
 *
 Juan G. Grande*
 -- Solr Consultant @ http://www.plugtree.com
 -- Blog @ http://juanggrande.wordpress.com
 
 
 On Thu, Mar 10, 2011 at 12:45 PM, Dario Rigolin
 
 dario.rigo...@comperio.itwrote:
  I have a text field indexed using WordDelimeter
  Indexed in that way
  doc
  field name=myfieldS.#L.W.VI.37/field
  ...
  /doc
  
  Serching in that way:
  http://192.168.3.3:8983/solr3.1/core0/select?q=myfield:(S.#L.W.VI.37)
  
  Makes this error:
  
  org.apache.lucene.queryParser.ParseException: Cannot parse
  'myfield:(S.': Lexical error at line 1, column 17.  Encountered: EOF
  after : \S.
  
  It seems that # is a wrong character for query... I try urlencoding o
  adding a
  slash before or removing quotes but other errors comes:
  
  http://192.168.3.3:8983/solr3.1/core0/select?q=myfield:(S.#L.W.VI.37)
  
  org.apache.lucene.queryParser.ParseException: Cannot parse 'myfield:(S.':
  Encountered EOF at line 1, column 15.
  
  Was expecting one of:
 AND ...
 OR ...
 NOT ...
 + ...
 - ...
 ( ...
 ) ...
 * ...
 ^ ...
 QUOTED ...
 TERM ...
 FUZZY_SLOP ...
 PREFIXTERM ...
 WILDTERM ...
 [ ...
 { ...
 NUMBER ...
  
  Any idea how to solve this?
  Maybe a bug? Or probably I'm missing something.
  
  Dario.


Re: Error on string searching # [STRANGE] [FIX]

2011-03-10 Thread Dario Rigolin
On Thursday, March 10, 2011 04:58:43 pm Dario Rigolin wrote:

It seems fixed by setting into WordDelimiterTokenizer

catenateWords=0 catenateNumbers=0 

Instead of 1 on both...

Nice to know...


 On Thursday, March 10, 2011 04:53:51 pm Juan Grande wrote:
  I think that the problem is with the # symbol, because it has a special
  meaning when used inside a URL. Try replacing it with %23, like this:
  http://192.168.3.3:8983/solr3.1/core0/select?q=myfield:(S.%23L.W.VI.37)
 
 If I do urlencoding and changing in %23 I get this error
 
 
 3
 
 java.lang.ArrayIndexOutOfBoundsException: 3
   at
 org.apache.lucene.search.MultiPhraseQuery$MultiPhraseWeight.scorer(MultiPhr
 aseQuery.java:185) at
 org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:208) at
 org.apache.lucene.search.Searcher.search(Searcher.java:88)
 
 
  Regards,
  *
  Juan G. Grande*
  -- Solr Consultant @ http://www.plugtree.com
  -- Blog @ http://juanggrande.wordpress.com
  
  
  On Thu, Mar 10, 2011 at 12:45 PM, Dario Rigolin
  
  dario.rigo...@comperio.itwrote:
   I have a text field indexed using WordDelimeter
   Indexed in that way
   doc
   field name=myfieldS.#L.W.VI.37/field
   ...
   /doc
   
   Serching in that way:
   http://192.168.3.3:8983/solr3.1/core0/select?q=myfield:(S.#L.W.VI.37)
   
   Makes this error:
   
   org.apache.lucene.queryParser.ParseException: Cannot parse
   'myfield:(S.': Lexical error at line 1, column 17.  Encountered: EOF
   after : \S.
   
   It seems that # is a wrong character for query... I try urlencoding o
   adding a
   slash before or removing quotes but other errors comes:
   
   http://192.168.3.3:8983/solr3.1/core0/select?q=myfield:(S.#L.W.VI.37)
   
   org.apache.lucene.queryParser.ParseException: Cannot parse
   'myfield:(S.': Encountered EOF at line 1, column 15.
   
   Was expecting one of:
  AND ...
  OR ...
  NOT ...
  + ...
  - ...
  ( ...
  ) ...
  * ...
  ^ ...
  QUOTED ...
  TERM ...
  FUZZY_SLOP ...
  PREFIXTERM ...
  WILDTERM ...
  [ ...
  { ...
  NUMBER ...
   
   Any idea how to solve this?
   Maybe a bug? Or probably I'm missing something.
   
   Dario.


Multivalued field search...

2010-11-18 Thread Dario Rigolin
I think this question is more related to Lucene query search but I'm posting 
here becuase I feel more Solr User :-)

I have multiple value field named field1 containint codes separated by a space

doc
field name=iddoc1/field
field name=field1A BB1 B BB2 C BB3/field
field name=field1A CC1 B CC2 C CC3/field
/doc
doc
field name=iddoc2/field
field name=field1A BB1 B FF2 C FF3/field
field name=field1A YY1 B BB2 C KK3/field
/doc

I would like that my query: 

q=field1:(A BB1 AND A BB2)

returns only doc1. At the moment is returning doc1 and doc2.

Any way to force query on a per single field instance and not considering 
multivalued as a unique string?
Looking at proximity search I saw that is working only on two term distance 
not on two phrase distance.

Any suggestion or ideas?

Thank you.

Dario


Re: Multivalued field search...

2010-11-18 Thread Dario Rigolin
On Thursday, November 18, 2010 12:36:40 pm Dario Rigolin wrote:

Sorry wrong query:
 
q=field1:(A BB1 AND B BB2)

Dario


[solved] Re: Multivalued field search...

2010-11-18 Thread Dario Rigolin
On Thursday, November 18, 2010 12:42:49 pm Dario Rigolin wrote:
 On Thursday, November 18, 2010 12:36:40 pm Dario Rigolin wrote:
 
 Sorry wrong query:
 
 q=field1:(A BB1 AND B BB2)
 
 Dario

q=field1:(A BB1 B BB2~10)

I discovered that proximity search works well with multiple terms

Ciao.

Dario.


DIH URLDataSource and useSolrAddSchema=true

2010-11-15 Thread Dario Rigolin
I'm looking to index data in Solr using a PHP page feeding the index.
In my application I have all docs allready converted to a solr/add xml 
document and I need to make solr able to get all changed documents into the 
index. Looking at DIH I decidec to use URLDataSource and useSolrAddSchema=true 
pointing to my application url: getchangeddocstoindex.php.

But my PHP page could stream hundreds of megabytes (maybe couple of Gigs!).
Anybody knows if do I need to adapt connectionTimeout and readTimeout in any 
way? 

Looking at URLDataSource documentation it seems that It's possible to 
implement a kind of chunking using Transformer and  $hasMore and $nextURL.

But having useSolrAddSchema I don't know how to setup a Transformer section.

My questions are:
1) Does exist any limit over that it's better to do chunking?
2) It's possible to do chunking having useSolrAddSchema=true?

Thanks


Dario.


Re: DIH URLDataSource and useSolrAddSchema=true

2010-11-15 Thread Dario Rigolin
On Monday, November 15, 2010 11:18:47 am Lance Norskog wrote:
 This is more complex than you need. The Solr update command can accept
 streamed data, with the stream.url and stream.file options. You can just
 use solr/update with stream.url=http://your.machine/your.php.script and
 it will read as fast as it wants.
 There is no parallel indexing support, but you will find that indexing
 in this way is generally disk-bound, not processor-bound.

Yes this is the way I'm using it at the moment (using stream.file), but I 
wasn't sure that it would works with url... Thank you for the assurance.
The idea to move to DIH was to hope into a kind of scheduler solr side to 
avoid two clients asking for the same indexing task.

Thank you.

Dario.


DataImporter using pure solr add XML

2010-10-25 Thread Dario Rigolin
Looking at DataImporter I'm not sure if it's possible to import using a 
standard adddoc... xml document representing a document add operation.
Generating adddoc is quite expensive in my application and I have cached 
all those documents into a text column into MySQL database.
It will be easier for me to push all updated documents directly from 
Database instead passing via multiple xml files posted in stream mode to 
Solr.

Thank you.

Dario.