There must be an explanation about 83 MB of compressed data getting almost
double of its size. It doesn't make sense at all.

On Sat, Oct 26, 2024 at 7:03 PM Andi Vajda <va...@apache.org> wrote:

>
> > On Oct 26, 2024, at 14:50, Prashant Saxena <animator...@gmail.com>
> wrote:
> >
> > I just need to store compressed strings to save space. If it can be
> done in
> > any other way, I'm OK with that.
>
> The JArray('byte') is the way.
>
> Andi..
>
> >
> >
> >> On Sat, Oct 26, 2024 at 6:11 PM Andi Vajda <va...@apache.org> wrote:
> >>
> >>
> >>> On Sat, 26 Oct 2024, Prashant Saxena wrote:
> >>>
> >>> PyLucene 10.0.0
> >>>
> >>> I'm trying to store a long text by compressing it first using zlib
> >>>
> >>> *doc.add(StoredField("contents",
> zlib.compress(ftext.encode('utf-8'))))*
> >>>
> >>> The resulting index size is *~83 MB*. When reading it's value back
> using
> >>>
> >>> *c = doc.getBinaryValue("contents")*
> >>>
> >>> It's returning 'NoneType' and when using
> >>>
> >>> *c = doc.get("contents")*
> >>>
> >>> It's returning a string which cannot be decompressed.
> >>>
> >>> When using
> >>>
> >>> *doc.add(StoredField("contents",
> >>> JArray('byte')(zlib.compress(ftext.encode('utf-8')))))*
> >>>
> >>> The resulting index size is ~*160 MB. *There is no problem in getting
> >> it's
> >>> value using
> >>>
> >>>
> >>>
> >>> *c = doc.getBinaryValue("contents")cc =
> >>> zlib.decompress(c.bytes.bytes_).decode('utf-8') *
> >>>
> >>> *Question 1 : *Why does the index size almost double when using JArray?
> >>
> >> Because the value you're passing is actually processed correctly ?
> >>
> >>> *Question 2: *How do you correctly create and store compressed binary
> >> data
> >>> in StoredField ?
> >>
> >> If you want a python byte object, like b'abcd', to be seen by Lucene
> >> (Java)
> >> as a byte array, you should wrap it with a JArray('byte') like you did.
> >> Otherwise, it's seen as a string (I need to double-check) and not
> handled
> >> correctly.
> >>
> >>> I am using PyLucene in my current project. Please advise me if I should
> >>> post my questions on the java-user list instead of here.
> >>
> >> This particular question is specific to PyLucene and should be asked
> here,
> >> like you did ;-)
> >>
> >> Andi..
> >>
>
>

Reply via email to