Re: Sorting keys with HFile.Writer

2019-01-02 Thread Stack
On Tue, Jan 1, 2019 at 8:42 AM Mike Thomsen  wrote:

> Thanks. Is there any value in me specifying the comparator?
>
>
You want to have a sort that is other than the natural hbase sort? It is
possible to specify your own comparator but you won't be able to put the
resulting hfiles under hbase for it to serve if the comparators do not
align.

S




> On Tue, Jan 1, 2019 at 9:07 AM 张铎(Duo Zhang) 
> wrote:
>
> > Yes, the keys should be sorted, before passing them to HFile.Writer. The
> > way we build index for HFile is based on this assumption, that the keys
> are
> > sorted.
> >
> > Mike Thomsen  于2019年1月1日周二 下午9:36写道:
> >
> > > I took some of the code from the JUnit test for setting up a HFile
> writer
> > > and it looked like it should work.
> > >
> > > Path f = new Path("/", "test-something.hfile");
> > > HFileContext context = new HFileContextBuilder()
> > > .withBlockSize(4096)
> > > .withIncludesTags(true)
> > > .withCompression(Compression.Algorithm.NONE).build();
> > > HFile.Writer w = HFile.getWriterFactory(conf, cacheConf)
> > > .withPath(FileSystem.get(conf), f)
> > > .withFileContext(context)
> > > .withComparator(KeyValue.COMPARATOR)
> > > .create();
> > > //KeyValue(byte[] row, byte[] family, byte[] qualifier, long timestamp,
> > > byte[] value)
> > > Map records = new HashMap();
> > > for (int x = 0; x < 1; x++) {
> > > String uuid = UUID.randomUUID().toString();
> > > KeyValue value = new KeyValue(uuid.getBytes(),
> "some-fam".getBytes(),
> > > "test".getBytes(), System.currentTimeMillis(), "hi".getBytes());
> > > w.append(value);
> > > }
> > >
> > > That threw this exception:
> > >
> > > java.io.IOException: Added a key not lexically larger than previous.
> > > Current cell =
> > >
> > >
> >
> 8faf82ba-5fed-4332-9731-0ebf4d4494f9/some-fam:test/1546348994160/Put/vlen=2/seqid=0,
> > > lastCell =
> > >
> > >
> >
> e32ec727-d946-4f67-a3c6-315a27c76408/some-fam:test/1546348994160/Put/vlen=2/seqid=0
> > > at
> > > org.apache.hadoop.hbase.io
> > > .hfile.AbstractHFileWriter.checkKey(AbstractHFileWriter.java:204)
> > > at
> > > org.apache.hadoop.hbase.io
> > > .hfile.HFileWriterV2.append(HFileWriterV2.java:267)
> > > at
> > > org.apache.hadoop.hbase.io
> > > .hfile.HFileWriterV3.append(HFileWriterV3.java:87)
> > > at HFileTest.main(HFileTest.java:39)
> > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > > at
> > >
> > >
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> > > at
> > >
> > >
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> > > at java.lang.reflect.Method.invoke(Method.java:498)
> > > at org.codehaus.mojo.exec.ExecJavaMojo$1.run(ExecJavaMojo.java:282)
> > > at java.lang.Thread.run(Thread.java:748)
> > >
> > > When I manually sorted the keys, it worked fine for creating and
> closing
> > > the new HFile. Did I miss something or is this expected behavior from
> the
> > > writer? I'm testing this against HBase 1.2.9.
> > >
> > > Thanks,
> > >
> > > Mike
> > >
> >
>


Re: Sorting keys with HFile.Writer

2019-01-01 Thread Mike Thomsen
Thanks. Is there any value in me specifying the comparator?

On Tue, Jan 1, 2019 at 9:07 AM 张铎(Duo Zhang)  wrote:

> Yes, the keys should be sorted, before passing them to HFile.Writer. The
> way we build index for HFile is based on this assumption, that the keys are
> sorted.
>
> Mike Thomsen  于2019年1月1日周二 下午9:36写道:
>
> > I took some of the code from the JUnit test for setting up a HFile writer
> > and it looked like it should work.
> >
> > Path f = new Path("/", "test-something.hfile");
> > HFileContext context = new HFileContextBuilder()
> > .withBlockSize(4096)
> > .withIncludesTags(true)
> > .withCompression(Compression.Algorithm.NONE).build();
> > HFile.Writer w = HFile.getWriterFactory(conf, cacheConf)
> > .withPath(FileSystem.get(conf), f)
> > .withFileContext(context)
> > .withComparator(KeyValue.COMPARATOR)
> > .create();
> > //KeyValue(byte[] row, byte[] family, byte[] qualifier, long timestamp,
> > byte[] value)
> > Map records = new HashMap();
> > for (int x = 0; x < 1; x++) {
> > String uuid = UUID.randomUUID().toString();
> > KeyValue value = new KeyValue(uuid.getBytes(), "some-fam".getBytes(),
> > "test".getBytes(), System.currentTimeMillis(), "hi".getBytes());
> > w.append(value);
> > }
> >
> > That threw this exception:
> >
> > java.io.IOException: Added a key not lexically larger than previous.
> > Current cell =
> >
> >
> 8faf82ba-5fed-4332-9731-0ebf4d4494f9/some-fam:test/1546348994160/Put/vlen=2/seqid=0,
> > lastCell =
> >
> >
> e32ec727-d946-4f67-a3c6-315a27c76408/some-fam:test/1546348994160/Put/vlen=2/seqid=0
> > at
> > org.apache.hadoop.hbase.io
> > .hfile.AbstractHFileWriter.checkKey(AbstractHFileWriter.java:204)
> > at
> > org.apache.hadoop.hbase.io
> > .hfile.HFileWriterV2.append(HFileWriterV2.java:267)
> > at
> > org.apache.hadoop.hbase.io
> > .hfile.HFileWriterV3.append(HFileWriterV3.java:87)
> > at HFileTest.main(HFileTest.java:39)
> > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > at
> >
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> > at
> >
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> > at java.lang.reflect.Method.invoke(Method.java:498)
> > at org.codehaus.mojo.exec.ExecJavaMojo$1.run(ExecJavaMojo.java:282)
> > at java.lang.Thread.run(Thread.java:748)
> >
> > When I manually sorted the keys, it worked fine for creating and closing
> > the new HFile. Did I miss something or is this expected behavior from the
> > writer? I'm testing this against HBase 1.2.9.
> >
> > Thanks,
> >
> > Mike
> >
>


Re: Sorting keys with HFile.Writer

2019-01-01 Thread Duo Zhang
Yes, the keys should be sorted, before passing them to HFile.Writer. The
way we build index for HFile is based on this assumption, that the keys are
sorted.

Mike Thomsen  于2019年1月1日周二 下午9:36写道:

> I took some of the code from the JUnit test for setting up a HFile writer
> and it looked like it should work.
>
> Path f = new Path("/", "test-something.hfile");
> HFileContext context = new HFileContextBuilder()
> .withBlockSize(4096)
> .withIncludesTags(true)
> .withCompression(Compression.Algorithm.NONE).build();
> HFile.Writer w = HFile.getWriterFactory(conf, cacheConf)
> .withPath(FileSystem.get(conf), f)
> .withFileContext(context)
> .withComparator(KeyValue.COMPARATOR)
> .create();
> //KeyValue(byte[] row, byte[] family, byte[] qualifier, long timestamp,
> byte[] value)
> Map records = new HashMap();
> for (int x = 0; x < 1; x++) {
> String uuid = UUID.randomUUID().toString();
> KeyValue value = new KeyValue(uuid.getBytes(), "some-fam".getBytes(),
> "test".getBytes(), System.currentTimeMillis(), "hi".getBytes());
> w.append(value);
> }
>
> That threw this exception:
>
> java.io.IOException: Added a key not lexically larger than previous.
> Current cell =
>
> 8faf82ba-5fed-4332-9731-0ebf4d4494f9/some-fam:test/1546348994160/Put/vlen=2/seqid=0,
> lastCell =
>
> e32ec727-d946-4f67-a3c6-315a27c76408/some-fam:test/1546348994160/Put/vlen=2/seqid=0
> at
> org.apache.hadoop.hbase.io
> .hfile.AbstractHFileWriter.checkKey(AbstractHFileWriter.java:204)
> at
> org.apache.hadoop.hbase.io
> .hfile.HFileWriterV2.append(HFileWriterV2.java:267)
> at
> org.apache.hadoop.hbase.io
> .hfile.HFileWriterV3.append(HFileWriterV3.java:87)
> at HFileTest.main(HFileTest.java:39)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at org.codehaus.mojo.exec.ExecJavaMojo$1.run(ExecJavaMojo.java:282)
> at java.lang.Thread.run(Thread.java:748)
>
> When I manually sorted the keys, it worked fine for creating and closing
> the new HFile. Did I miss something or is this expected behavior from the
> writer? I'm testing this against HBase 1.2.9.
>
> Thanks,
>
> Mike
>