Re: SQL equality predicate escaping single quotes

2019-08-09 Thread Joel Bernstein
It does appear that single quotes are being removed. If you want to provide
a patch that allows single quotes to get passed through, I can help with
testing and committing.


On Thu, Aug 8, 2019 at 11:28 AM Kyle Lilly  wrote:

> Hi,
>
> When using the SQL handler is there any way to escape single quotes in
> boolean predicates? A query like:
>
> SELECT title FROM books WHERE author_lastname = 'O''Reilly'
>
> Will return no results for authors with the last name "O'Reilly" but will
> return hits for books with a last name of "OReilly". I can perform a
> standard Solr term search using "lastname:O'Reilly" and get back the
> expected results. Looking through the code it appears all single quotes are
> stripped from term values in the SQL handler -
>
> https://github.com/apache/lucene-solr/blame/1d85cd783863f75cea133fb9c452302214165a4d/solr/core/src/java/org/apache/solr/handler/sql/SolrFilter.java#L136
> .
> If this is by design is there any way to use single quotes in a term
> predicate with SQL?
>
> Thanks.
>
> - Kyle
>


Re: Enumerating cores via SolrJ

2019-08-09 Thread Shawn Heisey

On 8/9/2019 3:07 PM, Mark H. Wood wrote:

Did I miss something, or is there no way, using SolrJ, to enumerate
loaded cores, as:

   curl 'http://solr.example.com:8983/solr/admin/cores?action=STATUS'

does?


This code will do so.  I tested it.

  public static void main(String[] args) throws SolrServerException, 
IOException {

String url = "http://localhost:8983/solr";;
HttpSolrClient client = new 
HttpSolrClient.Builder().withBaseSolrUrl(url).build();

CoreAdminRequest req = new CoreAdminRequest();
req.setAction(CoreAdminAction.STATUS);
CoreAdminResponse rsp = req.process(client);
NamedList full = rsp.getResponse();
NamedList status = (NamedList) full.get("status");
int count = status.size();
for (int i = 0; i < count; i++) {
  String coreName = status.getName(i);
  System.out.println("core: " + coreName);
}
  }

It's possible that NamedList has a cleaner way of doing this.  I went 
with what I know. :)


Note that the URL used to create the client object must end with /solr 
for this to work.  If you try a URL that ends with /solr/corename it 
won't work.  I like to use a client ending with /solr for *all* 
requests, and tell it what core I want the request to go to.


I would also not expect this to work with CloudSolrClient.

Thanks,
Shawn


Re: Enumerating cores via SolrJ

2019-08-09 Thread Brian Lininger
You can extend  org.apache.solr.client.solrj.request.CoreAdminRequest to do
exactly what you're asking for

On Fri, Aug 9, 2019 at 2:07 PM Mark H. Wood  wrote:

> Did I miss something, or is there no way, using SolrJ, to enumerate
> loaded cores, as:
>
>   curl 'http://solr.example.com:8983/solr/admin/cores?action=STATUS'
>
> does?
>
> --
> Mark H. Wood
> Lead Technology Analyst
>
> University Library
> Indiana University - Purdue University Indianapolis
> 755 W. Michigan Street
> Indianapolis, IN 46202
> 317-274-0749
> www.ulib.iupui.edu
>


-- 


*Brian Lininger*
Technical Architect, Infrastructure & Search
*Veeva Systems *
brian.linin...@veeva.com
www.veeva.com

*This email and the information it contains are intended for the intended
recipient only, are confidential and may be privileged information exempt
from disclosure by law.*
*If you have received this email in error, please notify us immediately by
reply email and delete this message from your computer.*
*Please do not retain, copy or distribute this email.*


Enumerating cores via SolrJ

2019-08-09 Thread Mark H. Wood
Did I miss something, or is there no way, using SolrJ, to enumerate
loaded cores, as:

  curl 'http://solr.example.com:8983/solr/admin/cores?action=STATUS'

does?

-- 
Mark H. Wood
Lead Technology Analyst

University Library
Indiana University - Purdue University Indianapolis
755 W. Michigan Street
Indianapolis, IN 46202
317-274-0749
www.ulib.iupui.edu


signature.asc
Description: PGP signature


install Solr keeping S3 as storage(neither OS file system not hdfs).

2019-08-09 Thread Suryakant Jadhav
Hi, 

I am trying to configure Solr with S3.
Could you please guide me step by step configuration for setting this up.
Can you see if we can install Solr keeping S3 as storage(neither OS file system 
not hdfs).

Best Regards,
Suryakant



Re: Indexed Data Size

2019-08-09 Thread Shawn Heisey

On 8/9/2019 12:17 PM, Moyer, Brett wrote:

The biggest is /data/solr/system_logs_shard1_replica_n1/data/index, files with 
the extensions I stated previously. Each is 5gb and there are a few hundred. 
Dated by to last 3 months. I don’t understand why there are so many files with 
such small indexes. Not sure how to clean them up.


Can you get a screenshot of the core overview for that particular core? 
Solr should correctly calculate the size on the overview based on what 
files are actually in the index directory.


Thanks,
Shawn


RE: Indexed Data Size

2019-08-09 Thread Moyer, Brett
Correct our indexes are small document wise, but for some ready we have a 
years' worth of files in the data/solr folders. There are no index. 
files.

The biggest is /data/solr/system_logs_shard1_replica_n1/data/index, files with 
the extensions I stated previously. Each is 5gb and there are a few hundred. 
Dated by to last 3 months. I don’t understand why there are so many files with 
such small indexes. Not sure how to clean them up. 

-Original Message-
From: Shawn Heisey  
Sent: Friday, August 9, 2019 9:11 AM
To: solr-user@lucene.apache.org
Subject: Re: Indexed Data Size

On 8/9/2019 6:12 AM, Moyer, Brett wrote:
> Thanks! We update each index nightly, we don’t clear, but bring in New and 
> Deltas, delete expired/404. All our data are basically webpages, so none are 
> very large. Some PDFs but again not too large. We are running Solr 7.5, 
> hopefully you can access the links.

Solr is saying that the entire size of the index directory is 95 MB for one of 
those indexes and the other is 30 MB.  Those sound to me like very small 
indexes, not very large like you indicated.  You were saying that the large 
files were in data/index, and did not mention anything about index. 
directories.

If you do have a bunch of index. directories in the "Data" 
directory mentioned on the Core overview page, you can safely delete all of the 
index and/or index.* directories under that directory EXCEPT the one that is 
indicated as the "Index" directory.  If you delete that one, you're deleting 
the actual live index ... and since you're not on Windows, the OS will let you 
delete it without complaining.

The directory locations are cut off on both screenshots, so I can't confirm 
anything there.

The larger core has about 2000 deleted docs and the smaller one has 40. 
Doing an optimize will not save much disk space or take very long.

Thanks,
Shawn
*
This e-mail may contain confidential or privileged information.
If you are not the intended recipient, please notify the sender immediately and 
then delete it.

TIAA
*


Solr restricting time-consuming/heavy processing queries

2019-08-09 Thread Mark Robinson
Hello,
I have the following questions please:-

In solrconfig.xml I created a new "/selecttimeout" handler copying
"/select" handler and added the following to my new "/selecttimeout":-
  
10
20
  

1.
Does the above mean that if I dont get a request once in 10ms on the socket
handling the /selecttimeout handler, that socket will be closed?

2.
Same with  connTimeOut? ie the connection  object remains live only if at
least a connection request comes once in every 20 mS; if not the object
gets closed?

Suppose a time consumeing query (say with lots of facets etc...), is fired
against SOLR. How can I prevent Solr processing it for not more than 1s?

3.
Is this achieved by setting timeAllowed=1000?  Or are there any other ways
to do this in Solr?

4
For the same purpose to prevent heavy queries overloading SOLR, does the
 above help in anyway or is it that shardHandler has nothing
to restrict a query once fired against Solr?


Could someone pls share your views?

Thanks!
Mark


Re: [jira] [Commented] (SOLR-9952) S3BackupRepository can install Solr keeping S3 as storage(neither OS filesystem not hdfs).

2019-08-09 Thread Shawn Heisey

On 8/9/2019 10:16 AM, Suryakant Jadhav wrote:

I am trying to configure Solr with S3.
Could you please guide me step by step configuration for setting this up.
Can you see if we can install Solr keeping S3 as storage(neither OS file system 
not hdfs).


Changes for that issue have not yet been committed, so unless you want 
to patch Solr source code yourself and compile a custom build, you can't 
use it yet.  It wouldn't surprise me to find that the patches won't 
apply cleanly to current Solr code, or to find that it doesn't even 
work.  I don't know enough about that part of the code to even begin 
investigating.


The issue has been open for about two and a half years and is currently 
unassigned, which probably means that nobody is actively working on 
getting it committed.


There has been recent activity (in the last couple of months) on the 
issue from someone outside the project who was able to get it working by 
fiddling with the code.  I don't think they have shared their changes -- 
the attachments on the issue are all from February 2017.


If you are proficient with Java code related to S3, we welcome feedback 
about the code and contributions.


Thanks,
Shawn


RE: [jira] [Commented] (SOLR-9952) S3BackupRepository can install Solr keeping S3 as storage(neither OS filesystem not hdfs).

2019-08-09 Thread Suryakant Jadhav
Hi, 

I am trying to configure Solr with S3.
Could you please guide me step by step configuration for setting this up.
Can you see if we can install Solr keeping S3 as storage(neither OS file system 
not hdfs).

Best Regards,
Suryakant 

-Original Message-
From: Kevin Risden (JIRA)  
Sent: 09 August 2019 20:17
To: Suryakant Jadhav 
Subject: [jira] [Commented] (SOLR-9952) S3BackupRepository


[ 
https://issues.apache.org/jira/browse/SOLR-9952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16903955#comment-16903955
 ] 

Kevin Risden commented on SOLR-9952:


[~suryakant.jadhav] - this is the wrong place to ask. Use the solr-user mailing 
list for questions [1]. Solr 4.10.3 is old and most likely will not work 
backing up to S3.

[1] https://lucene.apache.org/solr/community.html#mailing-lists-irc

> S3BackupRepository
> --
>
> Key: SOLR-9952
> URL: https://issues.apache.org/jira/browse/SOLR-9952
> Project: Solr
>  Issue Type: New Feature
>  Components: Backup/Restore
>Reporter: Mikhail Khludnev
>Priority: Major
> Attachments: 
> 0001-SOLR-9952-Added-dependencies-for-hadoop-amazon-integ.patch, 
> 0002-SOLR-9952-Added-integration-test-for-checking-backup.patch, Running Solr 
> on S3.pdf, core-site.xml.template
>
>
> I'd like to have a backup repository implementation allows to snapshot to AWS 
> S3



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


Re: Searches across Cores

2019-08-09 Thread Nicolas Franck
He's right. You can use the parameter "shards" for a very long time,
even before the whole solr cloud existed.

e.g. http://localhost:8983/solr/core0/select

with parameters:

  shards = 
localhost:8983/solr/core0,example.com:8983/solr/core0
  q = *:*
  defType = lucene

Yes, I used same core name twice (in the path and in the parameter), but I do 
not see another way.
You need to start the query at a query handler..

I guess your data is generated by several parties,
each on their own core? That makes sense.



On 9 Aug 2019, at 19:21, Vadim Ivanov 
mailto:vadim.iva...@spb.ntk-intourist.ru>> 
wrote:


May be consider having one collection with implicit sharding ?
This way you can have all advantages of solrcloud and can control content of 
each core "manualy" as well as query them independently (&distrib=false)
... or some of them using &shards=core1,core2 as was proposed before
Quote from doc
" If you created the collection and defined the "implicit" router at the time 
of creation, you can additionally define a router.field parameter to use a 
field from each document to identify a shard where the document belongs. If the 
field specified is missing in the document, however, the document will be 
rejected. You could also use the _route_ parameter to name a specific shard."
--
Vadim


-Original Message-
From: Komal Motwani [mailto:motwani.ko...@gmail.com]
Sent: Friday, August 09, 2019 7:57 PM
To: solr-user@lucene.apache.org
Subject: Re: Searches across Cores

For some good reasons, SolrCloud is not an option for me.
I need to run nested graph queries so firing parallel queries and taking
union/intersection won't work.
I am aware of achieving this via shards however I am looking for ways to
achieve this via multiple cores. We already have data existing in multiple
cores on which i need to add this feature.

Thanks,
Komal Motwani

On Fri, Aug 9, 2019 at 8:57 PM Erick Erickson 
mailto:erickerick...@gmail.com>>
wrote:

So my question is why do you have individual cores? Why not use SolrCloud
and collections and have this happen automatically?

There may be very good reasons, this is more if a sanity check….

On Aug 9, 2019, at 8:02 AM, Jan Høydahl 
mailto:jan@cominvent.com>> wrote:

USE request param &shards=core1,core2 or if on separate machines
host:port/solr/core1,host:port/solr/core2

Jan Høydahl

9. aug. 2019 kl. 11:23 skrev Komal Motwani
mailto:motwani.ko...@gmail.com>>:

Hi,



I have a use case where I would like a query to span across Cores
(Multi-Core); all the cores involved do have same schema. I have started
using solr just recently and have been trying to find ways to achieve
this
but couldn’t find any solution so far (Distributed searches, shards are
not
what I am looking for). I remember in one of the tech talks, there was a
mention of this feature to be included in future releases. Appreciate
any
pointers to help me progress further.



Thanks,

Komal Motwani






RE: Searches across Cores

2019-08-09 Thread Vadim Ivanov


May be consider having one collection with implicit sharding ?
This way you can have all advantages of solrcloud and can control content of 
each core "manualy" as well as query them independently (&distrib=false)
... or some of them using &shards=core1,core2 as was proposed before
Quote from doc
" If you created the collection and defined the "implicit" router at the time 
of creation, you can additionally define a router.field parameter to use a 
field from each document to identify a shard where the document belongs. If the 
field specified is missing in the document, however, the document will be 
rejected. You could also use the _route_ parameter to name a specific shard."
-- 
Vadim


> -Original Message-
> From: Komal Motwani [mailto:motwani.ko...@gmail.com]
> Sent: Friday, August 09, 2019 7:57 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Searches across Cores
> 
> For some good reasons, SolrCloud is not an option for me.
> I need to run nested graph queries so firing parallel queries and taking
> union/intersection won't work.
> I am aware of achieving this via shards however I am looking for ways to
> achieve this via multiple cores. We already have data existing in multiple
> cores on which i need to add this feature.
> 
> Thanks,
> Komal Motwani
> 
> On Fri, Aug 9, 2019 at 8:57 PM Erick Erickson 
> wrote:
> 
> > So my question is why do you have individual cores? Why not use SolrCloud
> > and collections and have this happen automatically?
> >
> > There may be very good reasons, this is more if a sanity check….
> >
> > > On Aug 9, 2019, at 8:02 AM, Jan Høydahl  wrote:
> > >
> > > USE request param &shards=core1,core2 or if on separate machines
> > host:port/solr/core1,host:port/solr/core2
> > >
> > > Jan Høydahl
> > >
> > >> 9. aug. 2019 kl. 11:23 skrev Komal Motwani
> :
> > >>
> > >> Hi,
> > >>
> > >>
> > >>
> > >> I have a use case where I would like a query to span across Cores
> > >> (Multi-Core); all the cores involved do have same schema. I have started
> > >> using solr just recently and have been trying to find ways to achieve
> > this
> > >> but couldn’t find any solution so far (Distributed searches, shards are
> > not
> > >> what I am looking for). I remember in one of the tech talks, there was a
> > >> mention of this feature to be included in future releases. Appreciate
> > any
> > >> pointers to help me progress further.
> > >>
> > >>
> > >>
> > >> Thanks,
> > >>
> > >> Komal Motwani
> >
> >



Re: Searches across Cores

2019-08-09 Thread Komal Motwani
For some good reasons, SolrCloud is not an option for me.
I need to run nested graph queries so firing parallel queries and taking
union/intersection won't work.
I am aware of achieving this via shards however I am looking for ways to
achieve this via multiple cores. We already have data existing in multiple
cores on which i need to add this feature.

Thanks,
Komal Motwani

On Fri, Aug 9, 2019 at 8:57 PM Erick Erickson 
wrote:

> So my question is why do you have individual cores? Why not use SolrCloud
> and collections and have this happen automatically?
>
> There may be very good reasons, this is more if a sanity check….
>
> > On Aug 9, 2019, at 8:02 AM, Jan Høydahl  wrote:
> >
> > USE request param &shards=core1,core2 or if on separate machines
> host:port/solr/core1,host:port/solr/core2
> >
> > Jan Høydahl
> >
> >> 9. aug. 2019 kl. 11:23 skrev Komal Motwani :
> >>
> >> Hi,
> >>
> >>
> >>
> >> I have a use case where I would like a query to span across Cores
> >> (Multi-Core); all the cores involved do have same schema. I have started
> >> using solr just recently and have been trying to find ways to achieve
> this
> >> but couldn’t find any solution so far (Distributed searches, shards are
> not
> >> what I am looking for). I remember in one of the tech talks, there was a
> >> mention of this feature to be included in future releases. Appreciate
> any
> >> pointers to help me progress further.
> >>
> >>
> >>
> >> Thanks,
> >>
> >> Komal Motwani
>
>


Re: Searches across Cores

2019-08-09 Thread Erick Erickson
So my question is why do you have individual cores? Why not use SolrCloud and 
collections and have this happen automatically?

There may be very good reasons, this is more if a sanity check….

> On Aug 9, 2019, at 8:02 AM, Jan Høydahl  wrote:
> 
> USE request param &shards=core1,core2 or if on separate machines 
> host:port/solr/core1,host:port/solr/core2
> 
> Jan Høydahl
> 
>> 9. aug. 2019 kl. 11:23 skrev Komal Motwani :
>> 
>> Hi,
>> 
>> 
>> 
>> I have a use case where I would like a query to span across Cores
>> (Multi-Core); all the cores involved do have same schema. I have started
>> using solr just recently and have been trying to find ways to achieve this
>> but couldn’t find any solution so far (Distributed searches, shards are not
>> what I am looking for). I remember in one of the tech talks, there was a
>> mention of this feature to be included in future releases. Appreciate any
>> pointers to help me progress further.
>> 
>> 
>> 
>> Thanks,
>> 
>> Komal Motwani



Re: Indexed Data Size

2019-08-09 Thread Shawn Heisey

On 8/9/2019 6:12 AM, Moyer, Brett wrote:

Thanks! We update each index nightly, we don’t clear, but bring in New and 
Deltas, delete expired/404. All our data are basically webpages, so none are 
very large. Some PDFs but again not too large. We are running Solr 7.5, 
hopefully you can access the links.


Solr is saying that the entire size of the index directory is 95 MB for 
one of those indexes and the other is 30 MB.  Those sound to me like 
very small indexes, not very large like you indicated.  You were saying 
that the large files were in data/index, and did not mention anything 
about index. directories.


If you do have a bunch of index. directories in the "Data" 
directory mentioned on the Core overview page, you can safely delete all 
of the index and/or index.* directories under that directory EXCEPT the 
one that is indicated as the "Index" directory.  If you delete that one, 
you're deleting the actual live index ... and since you're not on 
Windows, the OS will let you delete it without complaining.


The directory locations are cut off on both screenshots, so I can't 
confirm anything there.


The larger core has about 2000 deleted docs and the smaller one has 40. 
Doing an optimize will not save much disk space or take very long.


Thanks,
Shawn


RE: Indexed Data Size

2019-08-09 Thread Moyer, Brett
Thanks! We update each index nightly, we don’t clear, but bring in New and 
Deltas, delete expired/404. All our data are basically webpages, so none are 
very large. Some PDFs but again not too large. We are running Solr 7.5, 
hopefully you can access the links.

https://www.dropbox.com/s/lzd6hkoikhagujs/CoreOne.png?dl=0
https://www.dropbox.com/s/ae6rayb38q39u9c/CoreTwo.png?dl=0

Brett
-Original Message-
From: Erick Erickson  
Sent: Thursday, August 8, 2019 5:49 PM
To: solr-user@lucene.apache.org
Subject: Re: Indexed Data Size

On the surface, this makes no sense at all, so there’s something I don’t 
understand here ;). 

How often do you update your index? Having files from a long time ago is 
perfectly reasonable if you’re not updating regularly.

But your statement that some of these are huge for just a 50K document index is 
odd unless they’re _huge_ documents.

I wouldn’t optimize, unless you’re on Solr 7.5+ as that’ll create a single 
segment, see: 
https://lucidworks.com/post/segment-merging-deleted-documents-optimize-may-bad/
and
https://lucidworks.com/post/solr-and-optimizing-your-index-take-ii/

The extensions you mentioned are perfectly reasonable. Each segment is made up 
of multiple files. .fdt for instance contains stored data. See: 
https://lucene.apache.org/core/6_6_0/core/org/apache/lucene/codecs/lucene62/package-summary.html

Can you give us a long listing of one of your index directories?

Best,
Erick

> On Aug 8, 2019, at 5:17 PM, Moyer, Brett  wrote:
> 
> In our data/solr//data/index on the filesystem, we have files 
> that go back 1 year. I don’t understand why and I doubt they are in use. 
> Files with extensions like fdx,cfe,doc,pos,tip,dvm etc. Some of these are 
> very large and running us out of server space. Our search indexes themselves 
> are not large, in total we might have 50k documents.  How can I reduce this 
> /data/solr space? Is this what the Solr Optimize command is for? Thanks!
> 
> Brett
> 
> **
> *** This e-mail may contain confidential or privileged information.
> If you are not the intended recipient, please notify the sender immediately 
> and then delete it.
> 
> TIAA
> **
> ***

*
This e-mail may contain confidential or privileged information.
If you are not the intended recipient, please notify the sender immediately and 
then delete it.

TIAA
*


Re: Searches across Cores

2019-08-09 Thread Jan Høydahl
USE request param &shards=core1,core2 or if on separate machines 
host:port/solr/core1,host:port/solr/core2

Jan Høydahl

> 9. aug. 2019 kl. 11:23 skrev Komal Motwani :
> 
> Hi,
> 
> 
> 
> I have a use case where I would like a query to span across Cores
> (Multi-Core); all the cores involved do have same schema. I have started
> using solr just recently and have been trying to find ways to achieve this
> but couldn’t find any solution so far (Distributed searches, shards are not
> what I am looking for). I remember in one of the tech talks, there was a
> mention of this feature to be included in future releases. Appreciate any
> pointers to help me progress further.
> 
> 
> 
> Thanks,
> 
> Komal Motwani


Re: Searches across Cores

2019-08-09 Thread Sidharth Negi
Hi,

If the number of cores spanned is low, I guess firing parallel queries and
taking union or intersection should work since their schema is the same. Do
you notice any perceivable difference in performance?

Best,
Sidharth

On Fri, Aug 9, 2019 at 2:54 PM Komal Motwani 
wrote:

> Hi,
>
>
>
> I have a use case where I would like a query to span across Cores
> (Multi-Core); all the cores involved do have same schema. I have started
> using solr just recently and have been trying to find ways to achieve this
> but couldn’t find any solution so far (Distributed searches, shards are not
> what I am looking for). I remember in one of the tech talks, there was a
> mention of this feature to be included in future releases. Appreciate any
> pointers to help me progress further.
>
>
>
> Thanks,
>
> Komal Motwani
>


Searches across Cores

2019-08-09 Thread Komal Motwani
Hi,



I have a use case where I would like a query to span across Cores
(Multi-Core); all the cores involved do have same schema. I have started
using solr just recently and have been trying to find ways to achieve this
but couldn’t find any solution so far (Distributed searches, shards are not
what I am looking for). I remember in one of the tech talks, there was a
mention of this feature to be included in future releases. Appreciate any
pointers to help me progress further.



Thanks,

Komal Motwani


Query field alias - issue with circular reference

2019-08-09 Thread Jaroslaw Rozanski
Hi Folks,



Question about query field aliases.



Assuming one has fields:

 * foo1
 * foo2
Sending "defType=edismax&q=foo:hello&f.foo.qf=foo1 foo2" will work.



But what in case of, when one has fields:

 * foo
 * foo1
Say we want to add behaviour to queries that are already in use. We want to 
search in existing "foo" and "foo1" without making query changes.



Sending "defType=edismax&q=foo:hello&f.foo.qf=foo foo1" will *not* work. The 
error is "org.apache.solr.search.SyntaxError: Field aliases lead to a cycle".



So, is there anyway, to extend search query for the existing field without 
modifying index?


--
Jaroslaw Rozanski | m...@jarekrozanski.eu