Re: [PR] Add ALP (Adaptive Lossless floating-Point) encoding support [parquet-java]

2026-03-11 Thread via GitHub


julienledem closed pull request #3390: Add ALP (Adaptive Lossless 
floating-Point) encoding support
URL: https://github.com/apache/parquet-java/pull/3390


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Add ALP (Adaptive Lossless floating-Point) encoding support [parquet-java]

2026-03-11 Thread via GitHub


julienledem commented on PR #3390:
URL: https://github.com/apache/parquet-java/pull/3390#issuecomment-4040118296

   closed in favor of https://github.com/apache/parquet-java/pull/3397


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Add ALP (Adaptive Lossless floating-Point) encoding support [parquet-java]

2026-01-22 Thread via GitHub


emkornfield commented on code in PR #3390:
URL: https://github.com/apache/parquet-java/pull/3390#discussion_r2719799163


##
parquet-column/src/main/java/org/apache/parquet/column/values/alp/AlpValuesWriter.java:
##
@@ -0,0 +1,525 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.parquet.column.values.alp;
+
+import static org.apache.parquet.column.values.alp.AlpConstants.*;
+
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+import org.apache.parquet.bytes.ByteBufferAllocator;
+import org.apache.parquet.bytes.BytesInput;
+import org.apache.parquet.bytes.CapacityByteArrayOutputStream;
+import org.apache.parquet.column.Encoding;
+import org.apache.parquet.column.values.ValuesWriter;
+
+/**
+ * ALP (Adaptive Lossless floating-Point) values writer.
+ *
+ * ALP encoding converts floating-point values to integers using decimal 
scaling,
+ * then applies Frame of Reference (FOR) encoding and bit-packing.
+ * Values that cannot be losslessly converted are stored as exceptions.
+ *
+ * Page Layout:
+ * 
+ * ┌─┬┬┬─┐
+ * │ Header  │ AlpInfo Array  │ ForInfo Array  │ Data Array  │
+ * │ 8 bytes │ 4B × N vectors │ 5B/9B × N  │ Variable│
+ * └─┴┴┴─┘
+ * 
+ */
+public abstract class AlpValuesWriter extends ValuesWriter {
+
+  protected final int initialCapacity;
+  protected final int pageSize;
+  protected final ByteBufferAllocator allocator;
+  protected CapacityByteArrayOutputStream outputStream;
+
+  public AlpValuesWriter(int initialCapacity, int pageSize, 
ByteBufferAllocator allocator) {
+this.initialCapacity = initialCapacity;
+this.pageSize = pageSize;
+this.allocator = allocator;
+this.outputStream = new CapacityByteArrayOutputStream(initialCapacity, 
pageSize, allocator);
+  }
+
+  @Override
+  public Encoding getEncoding() {
+return Encoding.ALP;
+  }
+
+  @Override
+  public void close() {
+outputStream.close();
+  }
+
+  /**
+   * Float-specific ALP values writer.
+   */
+  public static class FloatAlpValuesWriter extends AlpValuesWriter {
+private float[] buffer;
+private int count;
+
+public FloatAlpValuesWriter(int initialCapacity, int pageSize, 
ByteBufferAllocator allocator) {
+  super(initialCapacity, pageSize, allocator);
+  // Initial buffer size - will grow as needed
+  this.buffer = new float[Math.max(ALP_VECTOR_SIZE, initialCapacity / 
Float.BYTES)];
+  this.count = 0;
+}
+
+@Override
+public void writeFloat(float v) {
+  if (count >= buffer.length) {

Review Comment:
   I don't think we should be growing here to all floats.  Once we reach 
vector-size we should be encoding the page.



##
parquet-column/src/main/java/org/apache/parquet/column/values/alp/AlpValuesWriter.java:
##
@@ -0,0 +1,525 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.parquet.column.values.alp;
+
+import static org.apache.parquet.column.values.alp.AlpConstants.*;
+
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+import org.apache.parquet.bytes.ByteBufferAllocator;
+import org.apache.parquet.bytes.BytesInput;
+import org.apache.parquet.bytes.CapacityByteArrayOutputStream;
+import org.apache.parquet.column.Encoding;
+import org.apache.parquet.column.values.ValuesWriter;
+
+/**
+ * ALP (Adaptive Lossless floating-Point) values wr

Re: [PR] Add ALP (Adaptive Lossless floating-Point) encoding support [parquet-java]

2026-01-22 Thread via GitHub


emkornfield commented on code in PR #3390:
URL: https://github.com/apache/parquet-java/pull/3390#discussion_r2719783661


##
parquet-column/src/main/java/org/apache/parquet/column/values/alp/AlpValuesWriter.java:
##
@@ -0,0 +1,525 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.parquet.column.values.alp;
+
+import static org.apache.parquet.column.values.alp.AlpConstants.*;
+
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+import org.apache.parquet.bytes.ByteBufferAllocator;
+import org.apache.parquet.bytes.BytesInput;
+import org.apache.parquet.bytes.CapacityByteArrayOutputStream;
+import org.apache.parquet.column.Encoding;
+import org.apache.parquet.column.values.ValuesWriter;
+
+/**
+ * ALP (Adaptive Lossless floating-Point) values writer.
+ *
+ * ALP encoding converts floating-point values to integers using decimal 
scaling,
+ * then applies Frame of Reference (FOR) encoding and bit-packing.
+ * Values that cannot be losslessly converted are stored as exceptions.
+ *
+ * Page Layout:
+ * 
+ * ┌─┬┬┬─┐
+ * │ Header  │ AlpInfo Array  │ ForInfo Array  │ Data Array  │
+ * │ 8 bytes │ 4B × N vectors │ 5B/9B × N  │ Variable│
+ * └─┴┴┴─┘
+ * 
+ */
+public abstract class AlpValuesWriter extends ValuesWriter {
+
+  protected final int initialCapacity;
+  protected final int pageSize;
+  protected final ByteBufferAllocator allocator;
+  protected CapacityByteArrayOutputStream outputStream;
+
+  public AlpValuesWriter(int initialCapacity, int pageSize, 
ByteBufferAllocator allocator) {
+this.initialCapacity = initialCapacity;
+this.pageSize = pageSize;
+this.allocator = allocator;
+this.outputStream = new CapacityByteArrayOutputStream(initialCapacity, 
pageSize, allocator);
+  }
+
+  @Override
+  public Encoding getEncoding() {
+return Encoding.ALP;
+  }
+
+  @Override
+  public void close() {
+outputStream.close();
+  }
+
+  /**
+   * Float-specific ALP values writer.
+   */
+  public static class FloatAlpValuesWriter extends AlpValuesWriter {
+private float[] buffer;
+private int count;
+
+public FloatAlpValuesWriter(int initialCapacity, int pageSize, 
ByteBufferAllocator allocator) {
+  super(initialCapacity, pageSize, allocator);
+  // Initial buffer size - will grow as needed
+  this.buffer = new float[Math.max(ALP_VECTOR_SIZE, initialCapacity / 
Float.BYTES)];
+  this.count = 0;
+}
+
+@Override
+public void writeFloat(float v) {
+  if (count >= buffer.length) {
+// Grow buffer
+float[] newBuffer = new float[buffer.length * 2];
+System.arraycopy(buffer, 0, newBuffer, 0, count);
+buffer = newBuffer;
+  }
+  buffer[count++] = v;
+}
+
+@Override
+public long getBufferedSize() {
+  // Estimate: each float value contributes roughly 2-4 bytes after 
compression

Review Comment:
   I think we should be flushing incrementally to byte buffers for the encoded 
data to get a more accurate value.  For unflushed data the size heuristic is 
maybe OK for an initial version.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Add ALP (Adaptive Lossless floating-Point) encoding support [parquet-java]

2026-01-22 Thread via GitHub


emkornfield commented on code in PR #3390:
URL: https://github.com/apache/parquet-java/pull/3390#discussion_r2719770156


##
parquet-column/src/main/java/org/apache/parquet/column/values/alp/AlpValuesWriter.java:
##
@@ -0,0 +1,525 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.parquet.column.values.alp;
+
+import static org.apache.parquet.column.values.alp.AlpConstants.*;
+
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+import org.apache.parquet.bytes.ByteBufferAllocator;
+import org.apache.parquet.bytes.BytesInput;
+import org.apache.parquet.bytes.CapacityByteArrayOutputStream;
+import org.apache.parquet.column.Encoding;
+import org.apache.parquet.column.values.ValuesWriter;
+
+/**
+ * ALP (Adaptive Lossless floating-Point) values writer.
+ *
+ * ALP encoding converts floating-point values to integers using decimal 
scaling,
+ * then applies Frame of Reference (FOR) encoding and bit-packing.
+ * Values that cannot be losslessly converted are stored as exceptions.
+ *
+ * Page Layout:
+ * 
+ * ┌─┬┬┬─┐
+ * │ Header  │ AlpInfo Array  │ ForInfo Array  │ Data Array  │
+ * │ 8 bytes │ 4B × N vectors │ 5B/9B × N  │ Variable│
+ * └─┴┴┴─┘
+ * 
+ */
+public abstract class AlpValuesWriter extends ValuesWriter {
+
+  protected final int initialCapacity;
+  protected final int pageSize;
+  protected final ByteBufferAllocator allocator;
+  protected CapacityByteArrayOutputStream outputStream;
+
+  public AlpValuesWriter(int initialCapacity, int pageSize, 
ByteBufferAllocator allocator) {
+this.initialCapacity = initialCapacity;
+this.pageSize = pageSize;
+this.allocator = allocator;
+this.outputStream = new CapacityByteArrayOutputStream(initialCapacity, 
pageSize, allocator);
+  }
+
+  @Override
+  public Encoding getEncoding() {
+return Encoding.ALP;
+  }
+
+  @Override
+  public void close() {
+outputStream.close();
+  }
+
+  /**
+   * Float-specific ALP values writer.
+   */
+  public static class FloatAlpValuesWriter extends AlpValuesWriter {
+private float[] buffer;
+private int count;
+
+public FloatAlpValuesWriter(int initialCapacity, int pageSize, 
ByteBufferAllocator allocator) {
+  super(initialCapacity, pageSize, allocator);
+  // Initial buffer size - will grow as needed
+  this.buffer = new float[Math.max(ALP_VECTOR_SIZE, initialCapacity / 
Float.BYTES)];

Review Comment:
   This should just be pageSize?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Add ALP (Adaptive Lossless floating-Point) encoding support [parquet-java]

2026-01-22 Thread via GitHub


emkornfield commented on code in PR #3390:
URL: https://github.com/apache/parquet-java/pull/3390#discussion_r2719763894


##
parquet-column/src/main/java/org/apache/parquet/column/values/alp/AlpValuesReaderForDouble.java:
##
@@ -0,0 +1,202 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.parquet.column.values.alp;
+
+import static org.apache.parquet.column.values.alp.AlpConstants.*;
+
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+import org.apache.parquet.bytes.ByteBufferInputStream;
+import org.apache.parquet.column.values.ValuesReader;
+import org.apache.parquet.io.ParquetDecodingException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * ALP values reader for DOUBLE type.
+ *
+ * Reads ALP-encoded double values from a page and decodes them back to 
double values.
+ *
+ * Page Layout:
+ * 
+ * ┌─┬┬┬─┐
+ * │ Header  │ AlpInfo Array  │ ForInfo Array  │ Data Array  │
+ * │ 8 bytes │ 4B × N vectors │ 9B × N vectors │ Variable│
+ * └─┴┴┴─┘
+ * 
+ */
+public class AlpValuesReaderForDouble extends ValuesReader {
+  private static final Logger LOG = 
LoggerFactory.getLogger(AlpValuesReaderForDouble.class);
+
+  // Decoded double values (eagerly decoded)
+  private double[] decodedValues;
+  private int currentIndex;
+  private int totalCount;
+
+  public AlpValuesReaderForDouble() {
+this.currentIndex = 0;
+this.totalCount = 0;
+  }
+
+  @Override
+  public void initFromPage(int valuesCount, ByteBufferInputStream stream)
+  throws ParquetDecodingException, IOException {
+LOG.debug("init from page at offset {} for length {}", stream.position(), 
stream.available());
+
+// Read and validate header
+ByteBuffer headerBuf = 
stream.slice(ALP_HEADER_SIZE).order(ByteOrder.LITTLE_ENDIAN);
+int version = headerBuf.get() & 0xFF;
+int compressionMode = headerBuf.get() & 0xFF;
+int integerEncoding = headerBuf.get() & 0xFF;
+int logVectorSize = headerBuf.get() & 0xFF;
+int numElements = headerBuf.getInt();
+
+if (version != ALP_VERSION) {
+  throw new ParquetDecodingException("Unsupported ALP version: " + version 
+ ", expected " + ALP_VERSION);
+}
+if (compressionMode != ALP_COMPRESSION_MODE) {
+  throw new ParquetDecodingException("Unsupported ALP compression mode: " 
+ compressionMode);
+}
+if (integerEncoding != ALP_INTEGER_ENCODING_FOR) {
+  throw new ParquetDecodingException("Unsupported ALP integer encoding: " 
+ integerEncoding);
+}
+
+int vectorSize = 1 << logVectorSize;
+int numVectors = (numElements + vectorSize - 1) / vectorSize;
+
+this.totalCount = numElements;
+this.decodedValues = new double[numElements];
+this.currentIndex = 0;
+
+// Read AlpInfo array
+ByteBuffer alpInfoBuf = stream.slice(ALP_INFO_SIZE * 
numVectors).order(ByteOrder.LITTLE_ENDIAN);
+int[] exponents = new int[numVectors];
+int[] factors = new int[numVectors];
+int[] numExceptions = new int[numVectors];
+
+for (int v = 0; v < numVectors; v++) {
+  exponents[v] = alpInfoBuf.get() & 0xFF;
+  factors[v] = alpInfoBuf.get() & 0xFF;
+  numExceptions[v] = alpInfoBuf.getShort() & 0x;
+}
+
+// Read ForInfo array
+ByteBuffer forInfoBuf = stream.slice(DOUBLE_FOR_INFO_SIZE * 
numVectors).order(ByteOrder.LITTLE_ENDIAN);
+long[] frameOfReference = new long[numVectors];
+int[] bitWidths = new int[numVectors];
+
+for (int v = 0; v < numVectors; v++) {
+  frameOfReference[v] = forInfoBuf.getLong();
+  bitWidths[v] = forInfoBuf.get() & 0xFF;
+}
+
+// Decode each vector
+for (int v = 0; v < numVectors; v++) {
+  int vectorStart = v * vectorSize;
+  int vectorEnd = Math.min(vectorStart + vectorSize, numElements);
+  int vectorLen = vectorEnd - vectorStart;
+
+  // Calculate packed data size
+  int packedBytes = (vectorLen * bitWidths[v] + 7) / 8;
+
+  // Read and unpack values
+  long[] deltas = new long[vectorLen];
+  if (bitWidths[v] > 0) {
+ByteBuffer packedBuf = 
stream

Re: [PR] Add ALP (Adaptive Lossless floating-Point) encoding support [parquet-java]

2026-01-22 Thread via GitHub


emkornfield commented on code in PR #3390:
URL: https://github.com/apache/parquet-java/pull/3390#discussion_r2719726613


##
parquet-column/src/main/java/org/apache/parquet/column/values/alp/AlpValuesWriter.java:
##
@@ -0,0 +1,525 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.parquet.column.values.alp;
+
+import static org.apache.parquet.column.values.alp.AlpConstants.*;
+
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+import org.apache.parquet.bytes.ByteBufferAllocator;
+import org.apache.parquet.bytes.BytesInput;
+import org.apache.parquet.bytes.CapacityByteArrayOutputStream;
+import org.apache.parquet.column.Encoding;
+import org.apache.parquet.column.values.ValuesWriter;
+
+/**
+ * ALP (Adaptive Lossless floating-Point) values writer.
+ *
+ * ALP encoding converts floating-point values to integers using decimal 
scaling,
+ * then applies Frame of Reference (FOR) encoding and bit-packing.
+ * Values that cannot be losslessly converted are stored as exceptions.
+ *
+ * Page Layout:
+ * 
+ * ┌─┬┬┬─┐
+ * │ Header  │ AlpInfo Array  │ ForInfo Array  │ Data Array  │
+ * │ 8 bytes │ 4B × N vectors │ 5B/9B × N  │ Variable│
+ * └─┴┴┴─┘
+ * 
+ */
+public abstract class AlpValuesWriter extends ValuesWriter {
+
+  protected final int initialCapacity;
+  protected final int pageSize;
+  protected final ByteBufferAllocator allocator;
+  protected CapacityByteArrayOutputStream outputStream;
+
+  public AlpValuesWriter(int initialCapacity, int pageSize, 
ByteBufferAllocator allocator) {
+this.initialCapacity = initialCapacity;
+this.pageSize = pageSize;
+this.allocator = allocator;
+this.outputStream = new CapacityByteArrayOutputStream(initialCapacity, 
pageSize, allocator);
+  }
+
+  @Override
+  public Encoding getEncoding() {
+return Encoding.ALP;
+  }
+
+  @Override
+  public void close() {
+outputStream.close();
+  }
+
+  /**
+   * Float-specific ALP values writer.
+   */
+  public static class FloatAlpValuesWriter extends AlpValuesWriter {
+private float[] buffer;
+private int count;
+
+public FloatAlpValuesWriter(int initialCapacity, int pageSize, 
ByteBufferAllocator allocator) {
+  super(initialCapacity, pageSize, allocator);
+  // Initial buffer size - will grow as needed
+  this.buffer = new float[Math.max(ALP_VECTOR_SIZE, initialCapacity / 
Float.BYTES)];

Review Comment:
   I don't think I understand this logic.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Add ALP (Adaptive Lossless floating-Point) encoding support [parquet-java]

2026-01-22 Thread via GitHub


emkornfield commented on code in PR #3390:
URL: https://github.com/apache/parquet-java/pull/3390#discussion_r2719722810


##
parquet-column/src/main/java/org/apache/parquet/column/values/alp/AlpValuesReaderForDouble.java:
##
@@ -0,0 +1,202 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.parquet.column.values.alp;
+
+import static org.apache.parquet.column.values.alp.AlpConstants.*;
+
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+import org.apache.parquet.bytes.ByteBufferInputStream;
+import org.apache.parquet.column.values.ValuesReader;
+import org.apache.parquet.io.ParquetDecodingException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * ALP values reader for DOUBLE type.
+ *
+ * Reads ALP-encoded double values from a page and decodes them back to 
double values.
+ *
+ * Page Layout:
+ * 
+ * ┌─┬┬┬─┐
+ * │ Header  │ AlpInfo Array  │ ForInfo Array  │ Data Array  │
+ * │ 8 bytes │ 4B × N vectors │ 9B × N vectors │ Variable│
+ * └─┴┴┴─┘
+ * 
+ */
+public class AlpValuesReaderForDouble extends ValuesReader {
+  private static final Logger LOG = 
LoggerFactory.getLogger(AlpValuesReaderForDouble.class);
+
+  // Decoded double values (eagerly decoded)
+  private double[] decodedValues;
+  private int currentIndex;
+  private int totalCount;
+
+  public AlpValuesReaderForDouble() {
+this.currentIndex = 0;
+this.totalCount = 0;
+  }
+
+  @Override
+  public void initFromPage(int valuesCount, ByteBufferInputStream stream)
+  throws ParquetDecodingException, IOException {
+LOG.debug("init from page at offset {} for length {}", stream.position(), 
stream.available());
+
+// Read and validate header
+ByteBuffer headerBuf = 
stream.slice(ALP_HEADER_SIZE).order(ByteOrder.LITTLE_ENDIAN);
+int version = headerBuf.get() & 0xFF;
+int compressionMode = headerBuf.get() & 0xFF;
+int integerEncoding = headerBuf.get() & 0xFF;
+int logVectorSize = headerBuf.get() & 0xFF;
+int numElements = headerBuf.getInt();
+
+if (version != ALP_VERSION) {
+  throw new ParquetDecodingException("Unsupported ALP version: " + version 
+ ", expected " + ALP_VERSION);
+}
+if (compressionMode != ALP_COMPRESSION_MODE) {
+  throw new ParquetDecodingException("Unsupported ALP compression mode: " 
+ compressionMode);
+}
+if (integerEncoding != ALP_INTEGER_ENCODING_FOR) {
+  throw new ParquetDecodingException("Unsupported ALP integer encoding: " 
+ integerEncoding);
+}
+
+int vectorSize = 1 << logVectorSize;
+int numVectors = (numElements + vectorSize - 1) / vectorSize;
+
+this.totalCount = numElements;
+this.decodedValues = new double[numElements];
+this.currentIndex = 0;
+
+// Read AlpInfo array
+ByteBuffer alpInfoBuf = stream.slice(ALP_INFO_SIZE * 
numVectors).order(ByteOrder.LITTLE_ENDIAN);
+int[] exponents = new int[numVectors];
+int[] factors = new int[numVectors];
+int[] numExceptions = new int[numVectors];
+
+for (int v = 0; v < numVectors; v++) {
+  exponents[v] = alpInfoBuf.get() & 0xFF;
+  factors[v] = alpInfoBuf.get() & 0xFF;
+  numExceptions[v] = alpInfoBuf.getShort() & 0x;
+}
+
+// Read ForInfo array
+ByteBuffer forInfoBuf = stream.slice(DOUBLE_FOR_INFO_SIZE * 
numVectors).order(ByteOrder.LITTLE_ENDIAN);
+long[] frameOfReference = new long[numVectors];
+int[] bitWidths = new int[numVectors];
+
+for (int v = 0; v < numVectors; v++) {
+  frameOfReference[v] = forInfoBuf.getLong();
+  bitWidths[v] = forInfoBuf.get() & 0xFF;
+}
+
+// Decode each vector

Review Comment:
   Do we really want do this all up-front?  Don't we only want to decode one 
vector at a time?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-

Re: [PR] Add ALP (Adaptive Lossless floating-Point) encoding support [parquet-java]

2026-01-22 Thread via GitHub


emkornfield commented on code in PR #3390:
URL: https://github.com/apache/parquet-java/pull/3390#discussion_r2719723926


##
parquet-column/src/main/java/org/apache/parquet/column/values/alp/AlpValuesReaderForFloat.java:
##
@@ -0,0 +1,201 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one

Review Comment:
   not reviewing this directly but all comments on double apply here I think.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Add ALP (Adaptive Lossless floating-Point) encoding support [parquet-java]

2026-01-22 Thread via GitHub


emkornfield commented on code in PR #3390:
URL: https://github.com/apache/parquet-java/pull/3390#discussion_r2719717695


##
parquet-column/src/main/java/org/apache/parquet/column/values/alp/AlpValuesReaderForDouble.java:
##
@@ -0,0 +1,202 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.parquet.column.values.alp;
+
+import static org.apache.parquet.column.values.alp.AlpConstants.*;
+
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+import org.apache.parquet.bytes.ByteBufferInputStream;
+import org.apache.parquet.column.values.ValuesReader;
+import org.apache.parquet.io.ParquetDecodingException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * ALP values reader for DOUBLE type.
+ *
+ * Reads ALP-encoded double values from a page and decodes them back to 
double values.
+ *
+ * Page Layout:
+ * 
+ * ┌─┬┬┬─┐
+ * │ Header  │ AlpInfo Array  │ ForInfo Array  │ Data Array  │
+ * │ 8 bytes │ 4B × N vectors │ 9B × N vectors │ Variable│
+ * └─┴┴┴─┘
+ * 
+ */
+public class AlpValuesReaderForDouble extends ValuesReader {
+  private static final Logger LOG = 
LoggerFactory.getLogger(AlpValuesReaderForDouble.class);
+
+  // Decoded double values (eagerly decoded)
+  private double[] decodedValues;
+  private int currentIndex;
+  private int totalCount;
+
+  public AlpValuesReaderForDouble() {
+this.currentIndex = 0;
+this.totalCount = 0;
+  }
+
+  @Override
+  public void initFromPage(int valuesCount, ByteBufferInputStream stream)
+  throws ParquetDecodingException, IOException {
+LOG.debug("init from page at offset {} for length {}", stream.position(), 
stream.available());
+
+// Read and validate header
+ByteBuffer headerBuf = 
stream.slice(ALP_HEADER_SIZE).order(ByteOrder.LITTLE_ENDIAN);
+int version = headerBuf.get() & 0xFF;
+int compressionMode = headerBuf.get() & 0xFF;
+int integerEncoding = headerBuf.get() & 0xFF;
+int logVectorSize = headerBuf.get() & 0xFF;
+int numElements = headerBuf.getInt();
+
+if (version != ALP_VERSION) {
+  throw new ParquetDecodingException("Unsupported ALP version: " + version 
+ ", expected " + ALP_VERSION);
+}
+if (compressionMode != ALP_COMPRESSION_MODE) {
+  throw new ParquetDecodingException("Unsupported ALP compression mode: " 
+ compressionMode);
+}
+if (integerEncoding != ALP_INTEGER_ENCODING_FOR) {
+  throw new ParquetDecodingException("Unsupported ALP integer encoding: " 
+ integerEncoding);
+}
+
+int vectorSize = 1 << logVectorSize;
+int numVectors = (numElements + vectorSize - 1) / vectorSize;
+
+this.totalCount = numElements;
+this.decodedValues = new double[numElements];
+this.currentIndex = 0;
+
+// Read AlpInfo array
+ByteBuffer alpInfoBuf = stream.slice(ALP_INFO_SIZE * 
numVectors).order(ByteOrder.LITTLE_ENDIAN);
+int[] exponents = new int[numVectors];
+int[] factors = new int[numVectors];
+int[] numExceptions = new int[numVectors];
+
+for (int v = 0; v < numVectors; v++) {
+  exponents[v] = alpInfoBuf.get() & 0xFF;
+  factors[v] = alpInfoBuf.get() & 0xFF;
+  numExceptions[v] = alpInfoBuf.getShort() & 0x;
+}
+
+// Read ForInfo array
+ByteBuffer forInfoBuf = stream.slice(DOUBLE_FOR_INFO_SIZE * 
numVectors).order(ByteOrder.LITTLE_ENDIAN);
+long[] frameOfReference = new long[numVectors];
+int[] bitWidths = new int[numVectors];
+
+for (int v = 0; v < numVectors; v++) {
+  frameOfReference[v] = forInfoBuf.getLong();
+  bitWidths[v] = forInfoBuf.get() & 0xFF;
+}
+
+// Decode each vector
+for (int v = 0; v < numVectors; v++) {
+  int vectorStart = v * vectorSize;
+  int vectorEnd = Math.min(vectorStart + vectorSize, numElements);
+  int vectorLen = vectorEnd - vectorStart;
+
+  // Calculate packed data size
+  int packedBytes = (vectorLen * bitWidths[v] + 7) / 8;
+
+  // Read and unpack values
+  long[] deltas = new long[vectorLen];
+  if (bitWidths[v] > 0) {
+ByteBuffer packedBuf = 
stream

Re: [PR] Add ALP (Adaptive Lossless floating-Point) encoding support [parquet-java]

2026-01-22 Thread via GitHub


emkornfield commented on code in PR #3390:
URL: https://github.com/apache/parquet-java/pull/3390#discussion_r2719715824


##
parquet-column/src/main/java/org/apache/parquet/column/values/alp/AlpValuesReaderForDouble.java:
##
@@ -0,0 +1,202 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.parquet.column.values.alp;
+
+import static org.apache.parquet.column.values.alp.AlpConstants.*;
+
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+import org.apache.parquet.bytes.ByteBufferInputStream;
+import org.apache.parquet.column.values.ValuesReader;
+import org.apache.parquet.io.ParquetDecodingException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * ALP values reader for DOUBLE type.
+ *
+ * Reads ALP-encoded double values from a page and decodes them back to 
double values.
+ *
+ * Page Layout:
+ * 
+ * ┌─┬┬┬─┐
+ * │ Header  │ AlpInfo Array  │ ForInfo Array  │ Data Array  │
+ * │ 8 bytes │ 4B × N vectors │ 9B × N vectors │ Variable│
+ * └─┴┴┴─┘
+ * 
+ */
+public class AlpValuesReaderForDouble extends ValuesReader {
+  private static final Logger LOG = 
LoggerFactory.getLogger(AlpValuesReaderForDouble.class);
+
+  // Decoded double values (eagerly decoded)
+  private double[] decodedValues;
+  private int currentIndex;
+  private int totalCount;
+
+  public AlpValuesReaderForDouble() {
+this.currentIndex = 0;
+this.totalCount = 0;
+  }
+
+  @Override
+  public void initFromPage(int valuesCount, ByteBufferInputStream stream)
+  throws ParquetDecodingException, IOException {
+LOG.debug("init from page at offset {} for length {}", stream.position(), 
stream.available());
+
+// Read and validate header
+ByteBuffer headerBuf = 
stream.slice(ALP_HEADER_SIZE).order(ByteOrder.LITTLE_ENDIAN);
+int version = headerBuf.get() & 0xFF;
+int compressionMode = headerBuf.get() & 0xFF;
+int integerEncoding = headerBuf.get() & 0xFF;
+int logVectorSize = headerBuf.get() & 0xFF;
+int numElements = headerBuf.getInt();
+
+if (version != ALP_VERSION) {
+  throw new ParquetDecodingException("Unsupported ALP version: " + version 
+ ", expected " + ALP_VERSION);
+}
+if (compressionMode != ALP_COMPRESSION_MODE) {
+  throw new ParquetDecodingException("Unsupported ALP compression mode: " 
+ compressionMode);
+}
+if (integerEncoding != ALP_INTEGER_ENCODING_FOR) {
+  throw new ParquetDecodingException("Unsupported ALP integer encoding: " 
+ integerEncoding);
+}
+
+int vectorSize = 1 << logVectorSize;
+int numVectors = (numElements + vectorSize - 1) / vectorSize;
+
+this.totalCount = numElements;
+this.decodedValues = new double[numElements];
+this.currentIndex = 0;
+
+// Read AlpInfo array
+ByteBuffer alpInfoBuf = stream.slice(ALP_INFO_SIZE * 
numVectors).order(ByteOrder.LITTLE_ENDIAN);
+int[] exponents = new int[numVectors];
+int[] factors = new int[numVectors];
+int[] numExceptions = new int[numVectors];
+
+for (int v = 0; v < numVectors; v++) {
+  exponents[v] = alpInfoBuf.get() & 0xFF;
+  factors[v] = alpInfoBuf.get() & 0xFF;
+  numExceptions[v] = alpInfoBuf.getShort() & 0x;
+}
+
+// Read ForInfo array
+ByteBuffer forInfoBuf = stream.slice(DOUBLE_FOR_INFO_SIZE * 
numVectors).order(ByteOrder.LITTLE_ENDIAN);
+long[] frameOfReference = new long[numVectors];
+int[] bitWidths = new int[numVectors];
+
+for (int v = 0; v < numVectors; v++) {
+  frameOfReference[v] = forInfoBuf.getLong();
+  bitWidths[v] = forInfoBuf.get() & 0xFF;
+}
+
+// Decode each vector
+for (int v = 0; v < numVectors; v++) {
+  int vectorStart = v * vectorSize;
+  int vectorEnd = Math.min(vectorStart + vectorSize, numElements);
+  int vectorLen = vectorEnd - vectorStart;
+
+  // Calculate packed data size
+  int packedBytes = (vectorLen * bitWidths[v] + 7) / 8;
+
+  // Read and unpack values
+  long[] deltas = new long[vectorLen];
+  if (bitWidths[v] > 0) {
+ByteBuffer packedBuf = 
stream

Re: [PR] Add ALP (Adaptive Lossless floating-Point) encoding support [parquet-java]

2026-01-22 Thread via GitHub


emkornfield commented on code in PR #3390:
URL: https://github.com/apache/parquet-java/pull/3390#discussion_r2719710393


##
parquet-column/src/main/java/org/apache/parquet/column/values/alp/AlpValuesReaderForDouble.java:
##
@@ -0,0 +1,202 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.parquet.column.values.alp;
+
+import static org.apache.parquet.column.values.alp.AlpConstants.*;
+
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+import org.apache.parquet.bytes.ByteBufferInputStream;
+import org.apache.parquet.column.values.ValuesReader;
+import org.apache.parquet.io.ParquetDecodingException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * ALP values reader for DOUBLE type.
+ *
+ * Reads ALP-encoded double values from a page and decodes them back to 
double values.
+ *
+ * Page Layout:
+ * 
+ * ┌─┬┬┬─┐
+ * │ Header  │ AlpInfo Array  │ ForInfo Array  │ Data Array  │
+ * │ 8 bytes │ 4B × N vectors │ 9B × N vectors │ Variable│
+ * └─┴┴┴─┘
+ * 
+ */
+public class AlpValuesReaderForDouble extends ValuesReader {
+  private static final Logger LOG = 
LoggerFactory.getLogger(AlpValuesReaderForDouble.class);
+
+  // Decoded double values (eagerly decoded)
+  private double[] decodedValues;
+  private int currentIndex;
+  private int totalCount;
+
+  public AlpValuesReaderForDouble() {
+this.currentIndex = 0;
+this.totalCount = 0;
+  }
+
+  @Override
+  public void initFromPage(int valuesCount, ByteBufferInputStream stream)
+  throws ParquetDecodingException, IOException {
+LOG.debug("init from page at offset {} for length {}", stream.position(), 
stream.available());
+
+// Read and validate header
+ByteBuffer headerBuf = 
stream.slice(ALP_HEADER_SIZE).order(ByteOrder.LITTLE_ENDIAN);
+int version = headerBuf.get() & 0xFF;
+int compressionMode = headerBuf.get() & 0xFF;
+int integerEncoding = headerBuf.get() & 0xFF;
+int logVectorSize = headerBuf.get() & 0xFF;
+int numElements = headerBuf.getInt();
+
+if (version != ALP_VERSION) {
+  throw new ParquetDecodingException("Unsupported ALP version: " + version 
+ ", expected " + ALP_VERSION);
+}
+if (compressionMode != ALP_COMPRESSION_MODE) {
+  throw new ParquetDecodingException("Unsupported ALP compression mode: " 
+ compressionMode);
+}
+if (integerEncoding != ALP_INTEGER_ENCODING_FOR) {
+  throw new ParquetDecodingException("Unsupported ALP integer encoding: " 
+ integerEncoding);
+}
+
+int vectorSize = 1 << logVectorSize;
+int numVectors = (numElements + vectorSize - 1) / vectorSize;
+
+this.totalCount = numElements;
+this.decodedValues = new double[numElements];
+this.currentIndex = 0;
+
+// Read AlpInfo array
+ByteBuffer alpInfoBuf = stream.slice(ALP_INFO_SIZE * 
numVectors).order(ByteOrder.LITTLE_ENDIAN);
+int[] exponents = new int[numVectors];
+int[] factors = new int[numVectors];
+int[] numExceptions = new int[numVectors];
+
+for (int v = 0; v < numVectors; v++) {
+  exponents[v] = alpInfoBuf.get() & 0xFF;
+  factors[v] = alpInfoBuf.get() & 0xFF;
+  numExceptions[v] = alpInfoBuf.getShort() & 0x;
+}
+
+// Read ForInfo array
+ByteBuffer forInfoBuf = stream.slice(DOUBLE_FOR_INFO_SIZE * 
numVectors).order(ByteOrder.LITTLE_ENDIAN);

Review Comment:
   we might as well slice this together with AlpInfo we keep the slices.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Add ALP (Adaptive Lossless floating-Point) encoding support [parquet-java]

2026-01-22 Thread via GitHub


emkornfield commented on code in PR #3390:
URL: https://github.com/apache/parquet-java/pull/3390#discussion_r2719707875


##
parquet-column/src/main/java/org/apache/parquet/column/values/alp/AlpValuesReaderForDouble.java:
##
@@ -0,0 +1,202 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.parquet.column.values.alp;
+
+import static org.apache.parquet.column.values.alp.AlpConstants.*;
+
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+import org.apache.parquet.bytes.ByteBufferInputStream;
+import org.apache.parquet.column.values.ValuesReader;
+import org.apache.parquet.io.ParquetDecodingException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * ALP values reader for DOUBLE type.
+ *
+ * Reads ALP-encoded double values from a page and decodes them back to 
double values.
+ *
+ * Page Layout:
+ * 
+ * ┌─┬┬┬─┐
+ * │ Header  │ AlpInfo Array  │ ForInfo Array  │ Data Array  │
+ * │ 8 bytes │ 4B × N vectors │ 9B × N vectors │ Variable│
+ * └─┴┴┴─┘
+ * 
+ */
+public class AlpValuesReaderForDouble extends ValuesReader {
+  private static final Logger LOG = 
LoggerFactory.getLogger(AlpValuesReaderForDouble.class);
+
+  // Decoded double values (eagerly decoded)
+  private double[] decodedValues;
+  private int currentIndex;
+  private int totalCount;
+
+  public AlpValuesReaderForDouble() {
+this.currentIndex = 0;
+this.totalCount = 0;
+  }
+
+  @Override
+  public void initFromPage(int valuesCount, ByteBufferInputStream stream)
+  throws ParquetDecodingException, IOException {
+LOG.debug("init from page at offset {} for length {}", stream.position(), 
stream.available());
+
+// Read and validate header
+ByteBuffer headerBuf = 
stream.slice(ALP_HEADER_SIZE).order(ByteOrder.LITTLE_ENDIAN);
+int version = headerBuf.get() & 0xFF;
+int compressionMode = headerBuf.get() & 0xFF;
+int integerEncoding = headerBuf.get() & 0xFF;
+int logVectorSize = headerBuf.get() & 0xFF;
+int numElements = headerBuf.getInt();
+
+if (version != ALP_VERSION) {
+  throw new ParquetDecodingException("Unsupported ALP version: " + version 
+ ", expected " + ALP_VERSION);
+}
+if (compressionMode != ALP_COMPRESSION_MODE) {
+  throw new ParquetDecodingException("Unsupported ALP compression mode: " 
+ compressionMode);
+}
+if (integerEncoding != ALP_INTEGER_ENCODING_FOR) {
+  throw new ParquetDecodingException("Unsupported ALP integer encoding: " 
+ integerEncoding);
+}
+
+int vectorSize = 1 << logVectorSize;
+int numVectors = (numElements + vectorSize - 1) / vectorSize;
+
+this.totalCount = numElements;
+this.decodedValues = new double[numElements];
+this.currentIndex = 0;
+
+// Read AlpInfo array
+ByteBuffer alpInfoBuf = stream.slice(ALP_INFO_SIZE * 
numVectors).order(ByteOrder.LITTLE_ENDIAN);

Review Comment:
   same comment as above on avoiding the slice.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Add ALP (Adaptive Lossless floating-Point) encoding support [parquet-java]

2026-01-22 Thread via GitHub


emkornfield commented on code in PR #3390:
URL: https://github.com/apache/parquet-java/pull/3390#discussion_r2719704733


##
parquet-column/src/main/java/org/apache/parquet/column/values/alp/AlpValuesReaderForDouble.java:
##
@@ -0,0 +1,202 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.parquet.column.values.alp;
+
+import static org.apache.parquet.column.values.alp.AlpConstants.*;
+
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+import org.apache.parquet.bytes.ByteBufferInputStream;
+import org.apache.parquet.column.values.ValuesReader;
+import org.apache.parquet.io.ParquetDecodingException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * ALP values reader for DOUBLE type.
+ *
+ * Reads ALP-encoded double values from a page and decodes them back to 
double values.
+ *
+ * Page Layout:
+ * 
+ * ┌─┬┬┬─┐
+ * │ Header  │ AlpInfo Array  │ ForInfo Array  │ Data Array  │
+ * │ 8 bytes │ 4B × N vectors │ 9B × N vectors │ Variable│
+ * └─┴┴┴─┘
+ * 
+ */
+public class AlpValuesReaderForDouble extends ValuesReader {
+  private static final Logger LOG = 
LoggerFactory.getLogger(AlpValuesReaderForDouble.class);
+
+  // Decoded double values (eagerly decoded)
+  private double[] decodedValues;
+  private int currentIndex;
+  private int totalCount;
+
+  public AlpValuesReaderForDouble() {
+this.currentIndex = 0;
+this.totalCount = 0;
+  }
+
+  @Override
+  public void initFromPage(int valuesCount, ByteBufferInputStream stream)
+  throws ParquetDecodingException, IOException {
+LOG.debug("init from page at offset {} for length {}", stream.position(), 
stream.available());
+
+// Read and validate header
+ByteBuffer headerBuf = 
stream.slice(ALP_HEADER_SIZE).order(ByteOrder.LITTLE_ENDIAN);

Review Comment:
   it looks like this call does a copy?  i wonder if we are better off using 
DataInputStream and reversing the bytes for numElements?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Add ALP (Adaptive Lossless floating-Point) encoding support [parquet-java]

2026-01-22 Thread via GitHub


emkornfield commented on code in PR #3390:
URL: https://github.com/apache/parquet-java/pull/3390#discussion_r2719701209


##
parquet-column/src/main/java/org/apache/parquet/column/values/alp/AlpValuesReaderForDouble.java:
##
@@ -0,0 +1,202 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.parquet.column.values.alp;
+
+import static org.apache.parquet.column.values.alp.AlpConstants.*;
+
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+import org.apache.parquet.bytes.ByteBufferInputStream;
+import org.apache.parquet.column.values.ValuesReader;
+import org.apache.parquet.io.ParquetDecodingException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * ALP values reader for DOUBLE type.
+ *
+ * Reads ALP-encoded double values from a page and decodes them back to 
double values.
+ *
+ * Page Layout:
+ * 
+ * ┌─┬┬┬─┐
+ * │ Header  │ AlpInfo Array  │ ForInfo Array  │ Data Array  │
+ * │ 8 bytes │ 4B × N vectors │ 9B × N vectors │ Variable│
+ * └─┴┴┴─┘
+ * 
+ */
+public class AlpValuesReaderForDouble extends ValuesReader {
+  private static final Logger LOG = 
LoggerFactory.getLogger(AlpValuesReaderForDouble.class);
+
+  // Decoded double values (eagerly decoded)
+  private double[] decodedValues;
+  private int currentIndex;
+  private int totalCount;
+
+  public AlpValuesReaderForDouble() {
+this.currentIndex = 0;
+this.totalCount = 0;
+  }
+
+  @Override
+  public void initFromPage(int valuesCount, ByteBufferInputStream stream)
+  throws ParquetDecodingException, IOException {
+LOG.debug("init from page at offset {} for length {}", stream.position(), 
stream.available());
+
+// Read and validate header
+ByteBuffer headerBuf = 
stream.slice(ALP_HEADER_SIZE).order(ByteOrder.LITTLE_ENDIAN);
+int version = headerBuf.get() & 0xFF;
+int compressionMode = headerBuf.get() & 0xFF;
+int integerEncoding = headerBuf.get() & 0xFF;
+int logVectorSize = headerBuf.get() & 0xFF;
+int numElements = headerBuf.getInt();

Review Comment:
   check numElements <= valuesCount ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Add ALP (Adaptive Lossless floating-Point) encoding support [parquet-java]

2026-01-22 Thread via GitHub


emkornfield commented on code in PR #3390:
URL: https://github.com/apache/parquet-java/pull/3390#discussion_r2719700566


##
parquet-column/src/main/java/org/apache/parquet/column/values/alp/AlpValuesReaderForDouble.java:
##
@@ -0,0 +1,202 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.parquet.column.values.alp;
+
+import static org.apache.parquet.column.values.alp.AlpConstants.*;
+
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+import org.apache.parquet.bytes.ByteBufferInputStream;
+import org.apache.parquet.column.values.ValuesReader;
+import org.apache.parquet.io.ParquetDecodingException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * ALP values reader for DOUBLE type.
+ *
+ * Reads ALP-encoded double values from a page and decodes them back to 
double values.
+ *
+ * Page Layout:
+ * 
+ * ┌─┬┬┬─┐
+ * │ Header  │ AlpInfo Array  │ ForInfo Array  │ Data Array  │
+ * │ 8 bytes │ 4B × N vectors │ 9B × N vectors │ Variable│
+ * └─┴┴┴─┘
+ * 
+ */
+public class AlpValuesReaderForDouble extends ValuesReader {
+  private static final Logger LOG = 
LoggerFactory.getLogger(AlpValuesReaderForDouble.class);
+
+  // Decoded double values (eagerly decoded)
+  private double[] decodedValues;
+  private int currentIndex;
+  private int totalCount;
+
+  public AlpValuesReaderForDouble() {
+this.currentIndex = 0;
+this.totalCount = 0;
+  }
+
+  @Override
+  public void initFromPage(int valuesCount, ByteBufferInputStream stream)
+  throws ParquetDecodingException, IOException {
+LOG.debug("init from page at offset {} for length {}", stream.position(), 
stream.available());
+
+// Read and validate header
+ByteBuffer headerBuf = 
stream.slice(ALP_HEADER_SIZE).order(ByteOrder.LITTLE_ENDIAN);
+int version = headerBuf.get() & 0xFF;
+int compressionMode = headerBuf.get() & 0xFF;
+int integerEncoding = headerBuf.get() & 0xFF;
+int logVectorSize = headerBuf.get() & 0xFF;
+int numElements = headerBuf.getInt();
+
+if (version != ALP_VERSION) {
+  throw new ParquetDecodingException("Unsupported ALP version: " + version 
+ ", expected " + ALP_VERSION);
+}
+if (compressionMode != ALP_COMPRESSION_MODE) {
+  throw new ParquetDecodingException("Unsupported ALP compression mode: " 
+ compressionMode);
+}
+if (integerEncoding != ALP_INTEGER_ENCODING_FOR) {
+  throw new ParquetDecodingException("Unsupported ALP integer encoding: " 
+ integerEncoding);
+}
+

Review Comment:
   lets check ` 0  <=  logVectorScale <= 16`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Add ALP (Adaptive Lossless floating-Point) encoding support [parquet-java]

2026-01-22 Thread via GitHub


emkornfield commented on code in PR #3390:
URL: https://github.com/apache/parquet-java/pull/3390#discussion_r2719663726


##
parquet-column/src/main/java/org/apache/parquet/column/values/alp/AlpEncoderDecoder.java:
##
@@ -0,0 +1,384 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.parquet.column.values.alp;
+
+import static org.apache.parquet.column.values.alp.AlpConstants.*;
+
+/**
+ * Core ALP (Adaptive Lossless floating-Point) encoding and decoding logic.
+ *
+ * ALP works by converting floating-point values to integers using decimal 
scaling,
+ * then applying Frame of Reference (FOR) encoding and bit-packing.
+ * Values that cannot be losslessly converted are stored as exceptions.
+ *
+ * Encoding formula: encoded = round(value * 10^(exponent - factor))
+ * Decoding formula: value = encoded / 10^(exponent - factor)
+ *
+ * Exception conditions:
+ * 
+ *   NaN values
+ *   Infinity values
+ *   Negative zero (-0.0)
+ *   Out of integer range
+ *   Round-trip failure (decode(encode(v)) != v)
+ * 
+ */
+public final class AlpEncoderDecoder {
+
+  private AlpEncoderDecoder() {
+// Utility class
+  }
+
+  // == Float Encoding/Decoding ==
+
+  /**
+   * Check if a float value is an exception (cannot be losslessly encoded).
+   *
+   * @param value the float value to check
+   * @return true if the value is an exception
+   */
+  public static boolean isFloatException(float value) {
+// NaN check
+if (Float.isNaN(value)) {
+  return true;
+}
+// Infinity check
+if (Float.isInfinite(value)) {
+  return true;
+}
+// Negative zero check
+if (Float.floatToRawIntBits(value) == FLOAT_NEGATIVE_ZERO_BITS) {
+  return true;
+}
+return false;
+  }
+
+  /**
+   * Check if a float value will be an exception for the given exponent/factor.
+   *
+   * @param valuethe float value
+   * @param exponent the decimal exponent (0-10)
+   * @param factor   the decimal factor (0 <= factor <= exponent)
+   * @return true if the value is an exception for this encoding
+   */
+  public static boolean isFloatException(float value, int exponent, int 
factor) {
+if (isFloatException(value)) {
+  return true;
+}
+
+// Try encoding and check for round-trip failure
+float multiplier = FLOAT_POW10[exponent];
+if (factor > 0) {
+  multiplier /= FLOAT_POW10[factor];
+}
+
+float scaled = value * multiplier;
+
+// Check for overflow
+if (scaled > Integer.MAX_VALUE || scaled < Integer.MIN_VALUE) {
+  return true;
+}
+
+// Fast round
+int encoded = fastRoundFloat(scaled);
+
+// Check round-trip
+float decoded = encoded / multiplier;
+return Float.floatToRawIntBits(value) != Float.floatToRawIntBits(decoded);
+  }
+
+  /**
+   * Encode a float value to an integer using the specified exponent and 
factor.
+   *
+   * Formula: encoded = round(value * 10^(exponent - factor))
+   *
+   * @param valuethe float value to encode
+   * @param exponent the decimal exponent (0-10)
+   * @param factor   the decimal factor (0 <= factor <= exponent)
+   * @return the encoded integer value
+   */
+  public static int encodeFloat(float value, int exponent, int factor) {
+float multiplier = FLOAT_POW10[exponent];
+if (factor > 0) {
+  multiplier /= FLOAT_POW10[factor];
+}
+return fastRoundFloat(value * multiplier);
+  }
+
+  /**
+   * Decode an integer back to a float using the specified exponent and factor.
+   *
+   * Formula: value = encoded / 10^(exponent - factor)
+   *
+   * @param encoded  the encoded integer value
+   * @param exponent the decimal exponent (0-10)
+   * @param factor   the decimal factor (0 <= factor <= exponent)
+   * @return the decoded float value
+   */
+  public static float decodeFloat(int encoded, int exponent, int factor) {
+float multiplier = FLOAT_POW10[exponent];
+if (factor > 0) {
+  multiplier /= FLOAT_POW10[factor];
+}
+return encoded / multiplier;
+  }
+
+  /**
+   * Fast rounding for float values using the magic number technique.
+   *
+   * @param value the float value to round
+   * @return the rounded integer value
+   */
+  public

Re: [PR] Add ALP (Adaptive Lossless floating-Point) encoding support [parquet-java]

2026-01-22 Thread via GitHub


emkornfield commented on code in PR #3390:
URL: https://github.com/apache/parquet-java/pull/3390#discussion_r2719675534


##
parquet-column/src/main/java/org/apache/parquet/column/values/alp/AlpEncoderDecoder.java:
##
@@ -0,0 +1,384 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.parquet.column.values.alp;
+
+import static org.apache.parquet.column.values.alp.AlpConstants.*;
+
+/**
+ * Core ALP (Adaptive Lossless floating-Point) encoding and decoding logic.
+ *
+ * ALP works by converting floating-point values to integers using decimal 
scaling,
+ * then applying Frame of Reference (FOR) encoding and bit-packing.
+ * Values that cannot be losslessly converted are stored as exceptions.
+ *
+ * Encoding formula: encoded = round(value * 10^(exponent - factor))
+ * Decoding formula: value = encoded / 10^(exponent - factor)
+ *
+ * Exception conditions:
+ * 
+ *   NaN values
+ *   Infinity values
+ *   Negative zero (-0.0)
+ *   Out of integer range
+ *   Round-trip failure (decode(encode(v)) != v)
+ * 
+ */
+public final class AlpEncoderDecoder {
+
+  private AlpEncoderDecoder() {
+// Utility class
+  }
+
+  // == Float Encoding/Decoding ==
+
+  /**
+   * Check if a float value is an exception (cannot be losslessly encoded).
+   *
+   * @param value the float value to check
+   * @return true if the value is an exception
+   */
+  public static boolean isFloatException(float value) {
+// NaN check
+if (Float.isNaN(value)) {
+  return true;
+}
+// Infinity check
+if (Float.isInfinite(value)) {
+  return true;
+}
+// Negative zero check
+if (Float.floatToRawIntBits(value) == FLOAT_NEGATIVE_ZERO_BITS) {
+  return true;
+}
+return false;
+  }
+
+  /**
+   * Check if a float value will be an exception for the given exponent/factor.
+   *
+   * @param valuethe float value
+   * @param exponent the decimal exponent (0-10)
+   * @param factor   the decimal factor (0 <= factor <= exponent)
+   * @return true if the value is an exception for this encoding
+   */
+  public static boolean isFloatException(float value, int exponent, int 
factor) {
+if (isFloatException(value)) {
+  return true;
+}
+
+// Try encoding and check for round-trip failure
+float multiplier = FLOAT_POW10[exponent];
+if (factor > 0) {
+  multiplier /= FLOAT_POW10[factor];
+}
+
+float scaled = value * multiplier;
+
+// Check for overflow
+if (scaled > Integer.MAX_VALUE || scaled < Integer.MIN_VALUE) {
+  return true;
+}
+
+// Fast round
+int encoded = fastRoundFloat(scaled);
+
+// Check round-trip
+float decoded = encoded / multiplier;
+return Float.floatToRawIntBits(value) != Float.floatToRawIntBits(decoded);
+  }
+
+  /**
+   * Encode a float value to an integer using the specified exponent and 
factor.
+   *
+   * Formula: encoded = round(value * 10^(exponent - factor))
+   *
+   * @param valuethe float value to encode
+   * @param exponent the decimal exponent (0-10)
+   * @param factor   the decimal factor (0 <= factor <= exponent)
+   * @return the encoded integer value
+   */
+  public static int encodeFloat(float value, int exponent, int factor) {
+float multiplier = FLOAT_POW10[exponent];
+if (factor > 0) {
+  multiplier /= FLOAT_POW10[factor];
+}
+return fastRoundFloat(value * multiplier);
+  }
+
+  /**
+   * Decode an integer back to a float using the specified exponent and factor.
+   *
+   * Formula: value = encoded / 10^(exponent - factor)
+   *
+   * @param encoded  the encoded integer value
+   * @param exponent the decimal exponent (0-10)
+   * @param factor   the decimal factor (0 <= factor <= exponent)
+   * @return the decoded float value
+   */
+  public static float decodeFloat(int encoded, int exponent, int factor) {
+float multiplier = FLOAT_POW10[exponent];
+if (factor > 0) {
+  multiplier /= FLOAT_POW10[factor];
+}
+return encoded / multiplier;
+  }
+
+  /**
+   * Fast rounding for float values using the magic number technique.
+   *
+   * @param value the float value to round
+   * @return the rounded integer value
+   */
+  public

Re: [PR] Add ALP (Adaptive Lossless floating-Point) encoding support [parquet-java]

2026-01-22 Thread via GitHub


emkornfield commented on code in PR #3390:
URL: https://github.com/apache/parquet-java/pull/3390#discussion_r2719666259


##
parquet-column/src/main/java/org/apache/parquet/column/values/alp/AlpEncoderDecoder.java:
##
@@ -0,0 +1,384 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.parquet.column.values.alp;
+
+import static org.apache.parquet.column.values.alp.AlpConstants.*;
+
+/**
+ * Core ALP (Adaptive Lossless floating-Point) encoding and decoding logic.
+ *
+ * ALP works by converting floating-point values to integers using decimal 
scaling,
+ * then applying Frame of Reference (FOR) encoding and bit-packing.
+ * Values that cannot be losslessly converted are stored as exceptions.
+ *
+ * Encoding formula: encoded = round(value * 10^(exponent - factor))
+ * Decoding formula: value = encoded / 10^(exponent - factor)
+ *
+ * Exception conditions:
+ * 
+ *   NaN values
+ *   Infinity values
+ *   Negative zero (-0.0)
+ *   Out of integer range
+ *   Round-trip failure (decode(encode(v)) != v)
+ * 
+ */
+public final class AlpEncoderDecoder {
+
+  private AlpEncoderDecoder() {
+// Utility class
+  }
+
+  // == Float Encoding/Decoding ==
+
+  /**
+   * Check if a float value is an exception (cannot be losslessly encoded).
+   *
+   * @param value the float value to check
+   * @return true if the value is an exception
+   */
+  public static boolean isFloatException(float value) {

Review Comment:
   please audit the visibily for these methods, it seems most could be at least 
package private?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Add ALP (Adaptive Lossless floating-Point) encoding support [parquet-java]

2026-01-22 Thread via GitHub


emkornfield commented on code in PR #3390:
URL: https://github.com/apache/parquet-java/pull/3390#discussion_r2719661887


##
parquet-column/src/main/java/org/apache/parquet/column/values/alp/AlpEncoderDecoder.java:
##
@@ -0,0 +1,384 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.parquet.column.values.alp;
+
+import static org.apache.parquet.column.values.alp.AlpConstants.*;
+
+/**
+ * Core ALP (Adaptive Lossless floating-Point) encoding and decoding logic.
+ *
+ * ALP works by converting floating-point values to integers using decimal 
scaling,
+ * then applying Frame of Reference (FOR) encoding and bit-packing.
+ * Values that cannot be losslessly converted are stored as exceptions.
+ *
+ * Encoding formula: encoded = round(value * 10^(exponent - factor))
+ * Decoding formula: value = encoded / 10^(exponent - factor)
+ *
+ * Exception conditions:
+ * 
+ *   NaN values
+ *   Infinity values
+ *   Negative zero (-0.0)
+ *   Out of integer range
+ *   Round-trip failure (decode(encode(v)) != v)
+ * 
+ */
+public final class AlpEncoderDecoder {
+
+  private AlpEncoderDecoder() {
+// Utility class
+  }
+
+  // == Float Encoding/Decoding ==
+
+  /**
+   * Check if a float value is an exception (cannot be losslessly encoded).
+   *
+   * @param value the float value to check
+   * @return true if the value is an exception
+   */
+  public static boolean isFloatException(float value) {
+// NaN check
+if (Float.isNaN(value)) {
+  return true;
+}
+// Infinity check
+if (Float.isInfinite(value)) {
+  return true;
+}
+// Negative zero check
+if (Float.floatToRawIntBits(value) == FLOAT_NEGATIVE_ZERO_BITS) {
+  return true;
+}
+return false;
+  }
+
+  /**
+   * Check if a float value will be an exception for the given exponent/factor.
+   *
+   * @param valuethe float value
+   * @param exponent the decimal exponent (0-10)
+   * @param factor   the decimal factor (0 <= factor <= exponent)
+   * @return true if the value is an exception for this encoding
+   */
+  public static boolean isFloatException(float value, int exponent, int 
factor) {
+if (isFloatException(value)) {
+  return true;
+}
+
+// Try encoding and check for round-trip failure
+float multiplier = FLOAT_POW10[exponent];
+if (factor > 0) {
+  multiplier /= FLOAT_POW10[factor];
+}
+
+float scaled = value * multiplier;
+
+// Check for overflow
+if (scaled > Integer.MAX_VALUE || scaled < Integer.MIN_VALUE) {
+  return true;
+}
+
+// Fast round
+int encoded = fastRoundFloat(scaled);
+
+// Check round-trip
+float decoded = encoded / multiplier;
+return Float.floatToRawIntBits(value) != Float.floatToRawIntBits(decoded);
+  }
+
+  /**
+   * Encode a float value to an integer using the specified exponent and 
factor.
+   *
+   * Formula: encoded = round(value * 10^(exponent - factor))
+   *
+   * @param valuethe float value to encode
+   * @param exponent the decimal exponent (0-10)
+   * @param factor   the decimal factor (0 <= factor <= exponent)
+   * @return the encoded integer value
+   */
+  public static int encodeFloat(float value, int exponent, int factor) {
+float multiplier = FLOAT_POW10[exponent];
+if (factor > 0) {
+  multiplier /= FLOAT_POW10[factor];
+}
+return fastRoundFloat(value * multiplier);
+  }
+
+  /**
+   * Decode an integer back to a float using the specified exponent and factor.
+   *
+   * Formula: value = encoded / 10^(exponent - factor)
+   *
+   * @param encoded  the encoded integer value
+   * @param exponent the decimal exponent (0-10)
+   * @param factor   the decimal factor (0 <= factor <= exponent)
+   * @return the decoded float value
+   */
+  public static float decodeFloat(int encoded, int exponent, int factor) {
+float multiplier = FLOAT_POW10[exponent];
+if (factor > 0) {
+  multiplier /= FLOAT_POW10[factor];
+}
+return encoded / multiplier;
+  }
+
+  /**
+   * Fast rounding for float values using the magic number technique.
+   *
+   * @param value the float value to round
+   * @return the rounded integer value
+   */
+  public

Re: [PR] Add ALP (Adaptive Lossless floating-Point) encoding support [parquet-java]

2026-01-22 Thread via GitHub


emkornfield commented on code in PR #3390:
URL: https://github.com/apache/parquet-java/pull/3390#discussion_r2719661063


##
parquet-column/src/main/java/org/apache/parquet/column/values/alp/AlpEncoderDecoder.java:
##
@@ -0,0 +1,384 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.parquet.column.values.alp;
+
+import static org.apache.parquet.column.values.alp.AlpConstants.*;
+
+/**
+ * Core ALP (Adaptive Lossless floating-Point) encoding and decoding logic.
+ *
+ * ALP works by converting floating-point values to integers using decimal 
scaling,
+ * then applying Frame of Reference (FOR) encoding and bit-packing.
+ * Values that cannot be losslessly converted are stored as exceptions.
+ *
+ * Encoding formula: encoded = round(value * 10^(exponent - factor))
+ * Decoding formula: value = encoded / 10^(exponent - factor)
+ *
+ * Exception conditions:
+ * 
+ *   NaN values
+ *   Infinity values
+ *   Negative zero (-0.0)
+ *   Out of integer range
+ *   Round-trip failure (decode(encode(v)) != v)
+ * 
+ */
+public final class AlpEncoderDecoder {
+
+  private AlpEncoderDecoder() {
+// Utility class
+  }
+
+  // == Float Encoding/Decoding ==
+
+  /**
+   * Check if a float value is an exception (cannot be losslessly encoded).
+   *
+   * @param value the float value to check
+   * @return true if the value is an exception
+   */
+  public static boolean isFloatException(float value) {
+// NaN check
+if (Float.isNaN(value)) {
+  return true;
+}
+// Infinity check
+if (Float.isInfinite(value)) {
+  return true;
+}
+// Negative zero check
+if (Float.floatToRawIntBits(value) == FLOAT_NEGATIVE_ZERO_BITS) {
+  return true;
+}
+return false;
+  }
+
+  /**
+   * Check if a float value will be an exception for the given exponent/factor.
+   *
+   * @param valuethe float value
+   * @param exponent the decimal exponent (0-10)
+   * @param factor   the decimal factor (0 <= factor <= exponent)
+   * @return true if the value is an exception for this encoding
+   */
+  public static boolean isFloatException(float value, int exponent, int 
factor) {
+if (isFloatException(value)) {
+  return true;
+}
+
+// Try encoding and check for round-trip failure
+float multiplier = FLOAT_POW10[exponent];
+if (factor > 0) {
+  multiplier /= FLOAT_POW10[factor];
+}
+
+float scaled = value * multiplier;
+
+// Check for overflow
+if (scaled > Integer.MAX_VALUE || scaled < Integer.MIN_VALUE) {
+  return true;
+}
+
+// Fast round
+int encoded = fastRoundFloat(scaled);
+
+// Check round-trip
+float decoded = encoded / multiplier;
+return Float.floatToRawIntBits(value) != Float.floatToRawIntBits(decoded);
+  }
+
+  /**
+   * Encode a float value to an integer using the specified exponent and 
factor.
+   *
+   * Formula: encoded = round(value * 10^(exponent - factor))
+   *
+   * @param valuethe float value to encode
+   * @param exponent the decimal exponent (0-10)
+   * @param factor   the decimal factor (0 <= factor <= exponent)
+   * @return the encoded integer value
+   */
+  public static int encodeFloat(float value, int exponent, int factor) {
+float multiplier = FLOAT_POW10[exponent];
+if (factor > 0) {
+  multiplier /= FLOAT_POW10[factor];
+}
+return fastRoundFloat(value * multiplier);
+  }
+
+  /**
+   * Decode an integer back to a float using the specified exponent and factor.
+   *
+   * Formula: value = encoded / 10^(exponent - factor)
+   *
+   * @param encoded  the encoded integer value
+   * @param exponent the decimal exponent (0-10)
+   * @param factor   the decimal factor (0 <= factor <= exponent)
+   * @return the decoded float value
+   */
+  public static float decodeFloat(int encoded, int exponent, int factor) {
+float multiplier = FLOAT_POW10[exponent];
+if (factor > 0) {
+  multiplier /= FLOAT_POW10[factor];
+}
+return encoded / multiplier;
+  }
+
+  /**
+   * Fast rounding for float values using the magic number technique.
+   *
+   * @param value the float value to round
+   * @return the rounded integer value
+   */
+  public

Re: [PR] Add ALP (Adaptive Lossless floating-Point) encoding support [parquet-java]

2026-01-22 Thread via GitHub


emkornfield commented on code in PR #3390:
URL: https://github.com/apache/parquet-java/pull/3390#discussion_r2719657449


##
parquet-column/src/main/java/org/apache/parquet/column/values/alp/AlpConstants.java:
##
@@ -0,0 +1,136 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.parquet.column.values.alp;
+
+/**
+ * Constants for the ALP (Adaptive Lossless floating-Point) encoding.
+ *
+ * ALP encoding converts floating-point values to integers using decimal 
scaling,
+ * then applies Frame of Reference (FOR) encoding and bit-packing.
+ * Values that cannot be losslessly converted are stored as exceptions.
+ *
+ * Based on the paper: "ALP: Adaptive Lossless floating-Point Compression" 
(SIGMOD 2024)
+ *
+ * @see https://dl.acm.org/doi/10.1145/3626717";>ALP Paper
+ */
+public final class AlpConstants {
+
+  private AlpConstants() {
+// Utility class
+  }
+
+  // == Page Header Constants ==
+
+  /** Current ALP format version */
+  public static final int ALP_VERSION = 1;
+
+  /** ALP compression mode identifier (0 = ALP) */
+  public static final int ALP_COMPRESSION_MODE = 0;
+
+  /** FOR encoding for integers (0 = FOR) */
+  public static final int ALP_INTEGER_ENCODING_FOR = 0;
+
+  /** Size of the ALP page header in bytes */
+  public static final int ALP_HEADER_SIZE = 8;
+
+  // == Vector Constants ==
+
+  /** Default number of elements per compressed vector (2^10 = 1024) */
+  public static final int ALP_VECTOR_SIZE = 1024;
+
+  /** Log2 of the default vector size */
+  public static final int ALP_VECTOR_SIZE_LOG = 10;

Review Comment:
   I think this is less important to make configurable then vector size as we 
are specifically calling it out as configurable in the spec we should make sure 
we generate some data at different bit-widths.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Add ALP (Adaptive Lossless floating-Point) encoding support [parquet-java]

2026-01-22 Thread via GitHub


emkornfield commented on code in PR #3390:
URL: https://github.com/apache/parquet-java/pull/3390#discussion_r2719351532


##
parquet-column/src/main/java/org/apache/parquet/column/values/alp/AlpEncoderDecoder.java:
##
@@ -0,0 +1,384 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.parquet.column.values.alp;
+
+import static org.apache.parquet.column.values.alp.AlpConstants.*;
+
+/**
+ * Core ALP (Adaptive Lossless floating-Point) encoding and decoding logic.
+ *
+ * ALP works by converting floating-point values to integers using decimal 
scaling,
+ * then applying Frame of Reference (FOR) encoding and bit-packing.
+ * Values that cannot be losslessly converted are stored as exceptions.
+ *
+ * Encoding formula: encoded = round(value * 10^(exponent - factor))
+ * Decoding formula: value = encoded / 10^(exponent - factor)
+ *
+ * Exception conditions:
+ * 
+ *   NaN values
+ *   Infinity values
+ *   Negative zero (-0.0)
+ *   Out of integer range
+ *   Round-trip failure (decode(encode(v)) != v)
+ * 
+ */
+public final class AlpEncoderDecoder {
+
+  private AlpEncoderDecoder() {
+// Utility class
+  }
+
+  // == Float Encoding/Decoding ==
+
+  /**
+   * Check if a float value is an exception (cannot be losslessly encoded).
+   *
+   * @param value the float value to check
+   * @return true if the value is an exception
+   */
+  public static boolean isFloatException(float value) {
+// NaN check
+if (Float.isNaN(value)) {
+  return true;
+}
+// Infinity check
+if (Float.isInfinite(value)) {
+  return true;
+}
+// Negative zero check
+if (Float.floatToRawIntBits(value) == FLOAT_NEGATIVE_ZERO_BITS) {
+  return true;
+}
+return false;
+  }
+
+  /**
+   * Check if a float value will be an exception for the given exponent/factor.
+   *
+   * @param valuethe float value
+   * @param exponent the decimal exponent (0-10)
+   * @param factor   the decimal factor (0 <= factor <= exponent)
+   * @return true if the value is an exception for this encoding
+   */
+  public static boolean isFloatException(float value, int exponent, int 
factor) {
+if (isFloatException(value)) {
+  return true;
+}
+
+// Try encoding and check for round-trip failure
+float multiplier = FLOAT_POW10[exponent];
+if (factor > 0) {
+  multiplier /= FLOAT_POW10[factor];
+}
+
+float scaled = value * multiplier;
+
+// Check for overflow
+if (scaled > Integer.MAX_VALUE || scaled < Integer.MIN_VALUE) {
+  return true;
+}
+
+// Fast round
+int encoded = fastRoundFloat(scaled);
+
+// Check round-trip
+float decoded = encoded / multiplier;
+return Float.floatToRawIntBits(value) != Float.floatToRawIntBits(decoded);
+  }
+
+  /**
+   * Encode a float value to an integer using the specified exponent and 
factor.
+   *
+   * Formula: encoded = round(value * 10^(exponent - factor))
+   *
+   * @param valuethe float value to encode
+   * @param exponent the decimal exponent (0-10)
+   * @param factor   the decimal factor (0 <= factor <= exponent)
+   * @return the encoded integer value
+   */
+  public static int encodeFloat(float value, int exponent, int factor) {
+float multiplier = FLOAT_POW10[exponent];
+if (factor > 0) {
+  multiplier /= FLOAT_POW10[factor];
+}
+return fastRoundFloat(value * multiplier);
+  }
+
+  /**
+   * Decode an integer back to a float using the specified exponent and factor.
+   *
+   * Formula: value = encoded / 10^(exponent - factor)
+   *
+   * @param encoded  the encoded integer value
+   * @param exponent the decimal exponent (0-10)
+   * @param factor   the decimal factor (0 <= factor <= exponent)
+   * @return the decoded float value
+   */
+  public static float decodeFloat(int encoded, int exponent, int factor) {
+float multiplier = FLOAT_POW10[exponent];
+if (factor > 0) {
+  multiplier /= FLOAT_POW10[factor];
+}
+return encoded / multiplier;
+  }
+
+  /**
+   * Fast rounding for float values using the magic number technique.
+   *
+   * @param value the float value to round
+   * @return the rounded integer value
+   */
+  public

Re: [PR] Add ALP (Adaptive Lossless floating-Point) encoding support [parquet-java]

2026-01-22 Thread via GitHub


emkornfield commented on code in PR #3390:
URL: https://github.com/apache/parquet-java/pull/3390#discussion_r2719349914


##
parquet-column/src/main/java/org/apache/parquet/column/values/alp/AlpEncoderDecoder.java:
##
@@ -0,0 +1,384 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.parquet.column.values.alp;
+
+import static org.apache.parquet.column.values.alp.AlpConstants.*;
+
+/**
+ * Core ALP (Adaptive Lossless floating-Point) encoding and decoding logic.
+ *
+ * ALP works by converting floating-point values to integers using decimal 
scaling,
+ * then applying Frame of Reference (FOR) encoding and bit-packing.
+ * Values that cannot be losslessly converted are stored as exceptions.
+ *
+ * Encoding formula: encoded = round(value * 10^(exponent - factor))
+ * Decoding formula: value = encoded / 10^(exponent - factor)
+ *
+ * Exception conditions:
+ * 
+ *   NaN values
+ *   Infinity values
+ *   Negative zero (-0.0)
+ *   Out of integer range
+ *   Round-trip failure (decode(encode(v)) != v)
+ * 
+ */
+public final class AlpEncoderDecoder {
+
+  private AlpEncoderDecoder() {
+// Utility class
+  }
+
+  // == Float Encoding/Decoding ==
+
+  /**
+   * Check if a float value is an exception (cannot be losslessly encoded).
+   *
+   * @param value the float value to check
+   * @return true if the value is an exception
+   */
+  public static boolean isFloatException(float value) {
+// NaN check
+if (Float.isNaN(value)) {
+  return true;
+}
+// Infinity check
+if (Float.isInfinite(value)) {
+  return true;
+}
+// Negative zero check
+if (Float.floatToRawIntBits(value) == FLOAT_NEGATIVE_ZERO_BITS) {
+  return true;
+}
+return false;
+  }
+
+  /**
+   * Check if a float value will be an exception for the given exponent/factor.
+   *
+   * @param valuethe float value
+   * @param exponent the decimal exponent (0-10)
+   * @param factor   the decimal factor (0 <= factor <= exponent)
+   * @return true if the value is an exception for this encoding
+   */
+  public static boolean isFloatException(float value, int exponent, int 
factor) {
+if (isFloatException(value)) {
+  return true;
+}
+
+// Try encoding and check for round-trip failure
+float multiplier = FLOAT_POW10[exponent];
+if (factor > 0) {
+  multiplier /= FLOAT_POW10[factor];
+}
+
+float scaled = value * multiplier;
+
+// Check for overflow
+if (scaled > Integer.MAX_VALUE || scaled < Integer.MIN_VALUE) {
+  return true;
+}
+
+// Fast round
+int encoded = fastRoundFloat(scaled);
+
+// Check round-trip
+float decoded = encoded / multiplier;
+return Float.floatToRawIntBits(value) != Float.floatToRawIntBits(decoded);
+  }
+
+  /**
+   * Encode a float value to an integer using the specified exponent and 
factor.
+   *
+   * Formula: encoded = round(value * 10^(exponent - factor))
+   *
+   * @param valuethe float value to encode
+   * @param exponent the decimal exponent (0-10)
+   * @param factor   the decimal factor (0 <= factor <= exponent)
+   * @return the encoded integer value
+   */
+  public static int encodeFloat(float value, int exponent, int factor) {
+float multiplier = FLOAT_POW10[exponent];
+if (factor > 0) {
+  multiplier /= FLOAT_POW10[factor];
+}
+return fastRoundFloat(value * multiplier);
+  }
+
+  /**
+   * Decode an integer back to a float using the specified exponent and factor.
+   *
+   * Formula: value = encoded / 10^(exponent - factor)
+   *
+   * @param encoded  the encoded integer value
+   * @param exponent the decimal exponent (0-10)
+   * @param factor   the decimal factor (0 <= factor <= exponent)
+   * @return the decoded float value
+   */
+  public static float decodeFloat(int encoded, int exponent, int factor) {
+float multiplier = FLOAT_POW10[exponent];
+if (factor > 0) {
+  multiplier /= FLOAT_POW10[factor];
+}
+return encoded / multiplier;
+  }
+
+  /**
+   * Fast rounding for float values using the magic number technique.
+   *
+   * @param value the float value to round
+   * @return the rounded integer value
+   */
+  public

Re: [PR] Add ALP (Adaptive Lossless floating-Point) encoding support [parquet-java]

2026-01-22 Thread via GitHub


emkornfield commented on code in PR #3390:
URL: https://github.com/apache/parquet-java/pull/3390#discussion_r2719346698


##
parquet-column/src/main/java/org/apache/parquet/column/values/alp/AlpEncoderDecoder.java:
##
@@ -0,0 +1,384 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.parquet.column.values.alp;
+
+import static org.apache.parquet.column.values.alp.AlpConstants.*;
+
+/**
+ * Core ALP (Adaptive Lossless floating-Point) encoding and decoding logic.
+ *
+ * ALP works by converting floating-point values to integers using decimal 
scaling,
+ * then applying Frame of Reference (FOR) encoding and bit-packing.
+ * Values that cannot be losslessly converted are stored as exceptions.
+ *
+ * Encoding formula: encoded = round(value * 10^(exponent - factor))
+ * Decoding formula: value = encoded / 10^(exponent - factor)
+ *
+ * Exception conditions:
+ * 
+ *   NaN values
+ *   Infinity values
+ *   Negative zero (-0.0)
+ *   Out of integer range
+ *   Round-trip failure (decode(encode(v)) != v)
+ * 
+ */
+public final class AlpEncoderDecoder {
+
+  private AlpEncoderDecoder() {
+// Utility class
+  }
+
+  // == Float Encoding/Decoding ==
+
+  /**
+   * Check if a float value is an exception (cannot be losslessly encoded).
+   *
+   * @param value the float value to check
+   * @return true if the value is an exception
+   */
+  public static boolean isFloatException(float value) {
+// NaN check
+if (Float.isNaN(value)) {
+  return true;
+}
+// Infinity check
+if (Float.isInfinite(value)) {
+  return true;
+}
+// Negative zero check
+if (Float.floatToRawIntBits(value) == FLOAT_NEGATIVE_ZERO_BITS) {
+  return true;
+}
+return false;
+  }
+
+  /**
+   * Check if a float value will be an exception for the given exponent/factor.
+   *
+   * @param valuethe float value
+   * @param exponent the decimal exponent (0-10)
+   * @param factor   the decimal factor (0 <= factor <= exponent)
+   * @return true if the value is an exception for this encoding
+   */
+  public static boolean isFloatException(float value, int exponent, int 
factor) {
+if (isFloatException(value)) {
+  return true;
+}
+
+// Try encoding and check for round-trip failure
+float multiplier = FLOAT_POW10[exponent];
+if (factor > 0) {
+  multiplier /= FLOAT_POW10[factor];
+}
+
+float scaled = value * multiplier;
+
+// Check for overflow
+if (scaled > Integer.MAX_VALUE || scaled < Integer.MIN_VALUE) {
+  return true;
+}
+
+// Fast round
+int encoded = fastRoundFloat(scaled);
+
+// Check round-trip
+float decoded = encoded / multiplier;
+return Float.floatToRawIntBits(value) != Float.floatToRawIntBits(decoded);
+  }
+
+  /**
+   * Encode a float value to an integer using the specified exponent and 
factor.
+   *
+   * Formula: encoded = round(value * 10^(exponent - factor))
+   *
+   * @param valuethe float value to encode
+   * @param exponent the decimal exponent (0-10)
+   * @param factor   the decimal factor (0 <= factor <= exponent)
+   * @return the encoded integer value
+   */
+  public static int encodeFloat(float value, int exponent, int factor) {
+float multiplier = FLOAT_POW10[exponent];
+if (factor > 0) {
+  multiplier /= FLOAT_POW10[factor];
+}
+return fastRoundFloat(value * multiplier);
+  }
+
+  /**
+   * Decode an integer back to a float using the specified exponent and factor.
+   *
+   * Formula: value = encoded / 10^(exponent - factor)
+   *
+   * @param encoded  the encoded integer value
+   * @param exponent the decimal exponent (0-10)
+   * @param factor   the decimal factor (0 <= factor <= exponent)
+   * @return the decoded float value
+   */
+  public static float decodeFloat(int encoded, int exponent, int factor) {
+float multiplier = FLOAT_POW10[exponent];
+if (factor > 0) {
+  multiplier /= FLOAT_POW10[factor];
+}
+return encoded / multiplier;
+  }
+
+  /**
+   * Fast rounding for float values using the magic number technique.
+   *
+   * @param value the float value to round
+   * @return the rounded integer value
+   */
+  public

Re: [PR] Add ALP (Adaptive Lossless floating-Point) encoding support [parquet-java]

2026-01-22 Thread via GitHub


emkornfield commented on code in PR #3390:
URL: https://github.com/apache/parquet-java/pull/3390#discussion_r2719348340


##
parquet-column/src/main/java/org/apache/parquet/column/values/alp/AlpEncoderDecoder.java:
##
@@ -0,0 +1,384 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.parquet.column.values.alp;
+
+import static org.apache.parquet.column.values.alp.AlpConstants.*;
+
+/**
+ * Core ALP (Adaptive Lossless floating-Point) encoding and decoding logic.
+ *
+ * ALP works by converting floating-point values to integers using decimal 
scaling,
+ * then applying Frame of Reference (FOR) encoding and bit-packing.
+ * Values that cannot be losslessly converted are stored as exceptions.
+ *
+ * Encoding formula: encoded = round(value * 10^(exponent - factor))
+ * Decoding formula: value = encoded / 10^(exponent - factor)
+ *
+ * Exception conditions:
+ * 
+ *   NaN values
+ *   Infinity values
+ *   Negative zero (-0.0)
+ *   Out of integer range
+ *   Round-trip failure (decode(encode(v)) != v)
+ * 
+ */
+public final class AlpEncoderDecoder {
+
+  private AlpEncoderDecoder() {
+// Utility class
+  }
+
+  // == Float Encoding/Decoding ==
+
+  /**
+   * Check if a float value is an exception (cannot be losslessly encoded).
+   *
+   * @param value the float value to check
+   * @return true if the value is an exception
+   */
+  public static boolean isFloatException(float value) {
+// NaN check
+if (Float.isNaN(value)) {
+  return true;
+}
+// Infinity check
+if (Float.isInfinite(value)) {
+  return true;
+}
+// Negative zero check
+if (Float.floatToRawIntBits(value) == FLOAT_NEGATIVE_ZERO_BITS) {
+  return true;
+}
+return false;
+  }
+
+  /**
+   * Check if a float value will be an exception for the given exponent/factor.
+   *
+   * @param valuethe float value
+   * @param exponent the decimal exponent (0-10)
+   * @param factor   the decimal factor (0 <= factor <= exponent)
+   * @return true if the value is an exception for this encoding
+   */
+  public static boolean isFloatException(float value, int exponent, int 
factor) {
+if (isFloatException(value)) {
+  return true;
+}
+
+// Try encoding and check for round-trip failure
+float multiplier = FLOAT_POW10[exponent];
+if (factor > 0) {
+  multiplier /= FLOAT_POW10[factor];
+}
+
+float scaled = value * multiplier;
+
+// Check for overflow
+if (scaled > Integer.MAX_VALUE || scaled < Integer.MIN_VALUE) {
+  return true;
+}
+
+// Fast round
+int encoded = fastRoundFloat(scaled);
+
+// Check round-trip
+float decoded = encoded / multiplier;
+return Float.floatToRawIntBits(value) != Float.floatToRawIntBits(decoded);
+  }
+
+  /**
+   * Encode a float value to an integer using the specified exponent and 
factor.
+   *
+   * Formula: encoded = round(value * 10^(exponent - factor))
+   *
+   * @param valuethe float value to encode
+   * @param exponent the decimal exponent (0-10)
+   * @param factor   the decimal factor (0 <= factor <= exponent)
+   * @return the encoded integer value
+   */
+  public static int encodeFloat(float value, int exponent, int factor) {
+float multiplier = FLOAT_POW10[exponent];
+if (factor > 0) {
+  multiplier /= FLOAT_POW10[factor];
+}
+return fastRoundFloat(value * multiplier);
+  }
+
+  /**
+   * Decode an integer back to a float using the specified exponent and factor.
+   *
+   * Formula: value = encoded / 10^(exponent - factor)
+   *
+   * @param encoded  the encoded integer value
+   * @param exponent the decimal exponent (0-10)
+   * @param factor   the decimal factor (0 <= factor <= exponent)
+   * @return the decoded float value
+   */
+  public static float decodeFloat(int encoded, int exponent, int factor) {
+float multiplier = FLOAT_POW10[exponent];
+if (factor > 0) {
+  multiplier /= FLOAT_POW10[factor];
+}
+return encoded / multiplier;
+  }
+
+  /**
+   * Fast rounding for float values using the magic number technique.
+   *
+   * @param value the float value to round
+   * @return the rounded integer value
+   */
+  public

Re: [PR] Add ALP (Adaptive Lossless floating-Point) encoding support [parquet-java]

2026-01-22 Thread via GitHub


emkornfield commented on code in PR #3390:
URL: https://github.com/apache/parquet-java/pull/3390#discussion_r2719341086


##
parquet-column/src/main/java/org/apache/parquet/column/values/alp/AlpEncoderDecoder.java:
##
@@ -0,0 +1,384 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.parquet.column.values.alp;
+
+import static org.apache.parquet.column.values.alp.AlpConstants.*;
+
+/**
+ * Core ALP (Adaptive Lossless floating-Point) encoding and decoding logic.
+ *
+ * ALP works by converting floating-point values to integers using decimal 
scaling,
+ * then applying Frame of Reference (FOR) encoding and bit-packing.
+ * Values that cannot be losslessly converted are stored as exceptions.
+ *
+ * Encoding formula: encoded = round(value * 10^(exponent - factor))
+ * Decoding formula: value = encoded / 10^(exponent - factor)
+ *
+ * Exception conditions:
+ * 
+ *   NaN values
+ *   Infinity values
+ *   Negative zero (-0.0)
+ *   Out of integer range
+ *   Round-trip failure (decode(encode(v)) != v)
+ * 
+ */
+public final class AlpEncoderDecoder {
+
+  private AlpEncoderDecoder() {
+// Utility class
+  }
+
+  // == Float Encoding/Decoding ==
+
+  /**
+   * Check if a float value is an exception (cannot be losslessly encoded).
+   *
+   * @param value the float value to check
+   * @return true if the value is an exception
+   */
+  public static boolean isFloatException(float value) {
+// NaN check
+if (Float.isNaN(value)) {
+  return true;
+}
+// Infinity check
+if (Float.isInfinite(value)) {
+  return true;
+}
+// Negative zero check
+if (Float.floatToRawIntBits(value) == FLOAT_NEGATIVE_ZERO_BITS) {
+  return true;
+}
+return false;
+  }
+
+  /**
+   * Check if a float value will be an exception for the given exponent/factor.
+   *
+   * @param valuethe float value
+   * @param exponent the decimal exponent (0-10)
+   * @param factor   the decimal factor (0 <= factor <= exponent)
+   * @return true if the value is an exception for this encoding
+   */
+  public static boolean isFloatException(float value, int exponent, int 
factor) {
+if (isFloatException(value)) {
+  return true;
+}
+
+// Try encoding and check for round-trip failure
+float multiplier = FLOAT_POW10[exponent];
+if (factor > 0) {

Review Comment:
   Not sure I understand the > 0 check, need to examine encoding logic in more 
detail, is this just an optimization to avoid dividing by 1?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Add ALP (Adaptive Lossless floating-Point) encoding support [parquet-java]

2026-01-22 Thread via GitHub


emkornfield commented on code in PR #3390:
URL: https://github.com/apache/parquet-java/pull/3390#discussion_r2719334713


##
parquet-column/src/main/java/org/apache/parquet/column/values/alp/AlpConstants.java:
##
@@ -0,0 +1,136 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.parquet.column.values.alp;
+
+/**
+ * Constants for the ALP (Adaptive Lossless floating-Point) encoding.
+ *
+ * ALP encoding converts floating-point values to integers using decimal 
scaling,
+ * then applies Frame of Reference (FOR) encoding and bit-packing.
+ * Values that cannot be losslessly converted are stored as exceptions.
+ *
+ * Based on the paper: "ALP: Adaptive Lossless floating-Point Compression" 
(SIGMOD 2024)
+ *
+ * @see https://dl.acm.org/doi/10.1145/3626717";>ALP Paper
+ */
+public final class AlpConstants {
+
+  private AlpConstants() {
+// Utility class
+  }
+
+  // == Page Header Constants ==
+
+  /** Current ALP format version */
+  public static final int ALP_VERSION = 1;
+
+  /** ALP compression mode identifier (0 = ALP) */
+  public static final int ALP_COMPRESSION_MODE = 0;
+
+  /** FOR encoding for integers (0 = FOR) */
+  public static final int ALP_INTEGER_ENCODING_FOR = 0;
+
+  /** Size of the ALP page header in bytes */
+  public static final int ALP_HEADER_SIZE = 8;
+
+  // == Vector Constants ==
+
+  /** Default number of elements per compressed vector (2^10 = 1024) */
+  public static final int ALP_VECTOR_SIZE = 1024;
+
+  /** Log2 of the default vector size */
+  public static final int ALP_VECTOR_SIZE_LOG = 10;
+
+  // == Exponent/Factor Limits ==
+
+  /** Maximum exponent for float encoding (10^10 ~ 10 billion) */
+  public static final int FLOAT_MAX_EXPONENT = 10;
+
+  /** Maximum exponent for double encoding (10^18 ~ 1 quintillion) */
+  public static final int DOUBLE_MAX_EXPONENT = 18;
+
+  /** Number of (exponent, factor) combinations for float: sum(1..11) = 66 */
+  public static final int FLOAT_COMBINATIONS = 66;
+
+  /** Number of (exponent, factor) combinations for double: sum(1..19) = 190 */
+  public static final int DOUBLE_COMBINATIONS = 190;
+
+  // == Sampling Constants ==
+
+  /** Number of values sampled per vector */
+  public static final int SAMPLER_SAMPLES_PER_VECTOR = 256;

Review Comment:
   wonder if these should be configurable somehow?  Probably OK if not.
   
   can these be package private?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Add ALP (Adaptive Lossless floating-Point) encoding support [parquet-java]

2026-01-22 Thread via GitHub


emkornfield commented on code in PR #3390:
URL: https://github.com/apache/parquet-java/pull/3390#discussion_r2719332440


##
parquet-column/src/main/java/org/apache/parquet/column/values/alp/AlpConstants.java:
##
@@ -0,0 +1,136 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.parquet.column.values.alp;
+
+/**
+ * Constants for the ALP (Adaptive Lossless floating-Point) encoding.
+ *
+ * ALP encoding converts floating-point values to integers using decimal 
scaling,
+ * then applies Frame of Reference (FOR) encoding and bit-packing.
+ * Values that cannot be losslessly converted are stored as exceptions.
+ *
+ * Based on the paper: "ALP: Adaptive Lossless floating-Point Compression" 
(SIGMOD 2024)
+ *
+ * @see https://dl.acm.org/doi/10.1145/3626717";>ALP Paper
+ */
+public final class AlpConstants {
+
+  private AlpConstants() {
+// Utility class
+  }
+
+  // == Page Header Constants ==
+
+  /** Current ALP format version */
+  public static final int ALP_VERSION = 1;
+
+  /** ALP compression mode identifier (0 = ALP) */
+  public static final int ALP_COMPRESSION_MODE = 0;
+
+  /** FOR encoding for integers (0 = FOR) */
+  public static final int ALP_INTEGER_ENCODING_FOR = 0;
+
+  /** Size of the ALP page header in bytes */
+  public static final int ALP_HEADER_SIZE = 8;
+
+  // == Vector Constants ==
+
+  /** Default number of elements per compressed vector (2^10 = 1024) */
+  public static final int ALP_VECTOR_SIZE = 1024;

Review Comment:
   nit: this is default but it should be configurable.



##
parquet-column/src/main/java/org/apache/parquet/column/values/alp/AlpConstants.java:
##
@@ -0,0 +1,136 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.parquet.column.values.alp;
+
+/**
+ * Constants for the ALP (Adaptive Lossless floating-Point) encoding.
+ *
+ * ALP encoding converts floating-point values to integers using decimal 
scaling,
+ * then applies Frame of Reference (FOR) encoding and bit-packing.
+ * Values that cannot be losslessly converted are stored as exceptions.
+ *
+ * Based on the paper: "ALP: Adaptive Lossless floating-Point Compression" 
(SIGMOD 2024)
+ *
+ * @see https://dl.acm.org/doi/10.1145/3626717";>ALP Paper
+ */
+public final class AlpConstants {
+
+  private AlpConstants() {
+// Utility class
+  }
+
+  // == Page Header Constants ==
+
+  /** Current ALP format version */
+  public static final int ALP_VERSION = 1;
+
+  /** ALP compression mode identifier (0 = ALP) */
+  public static final int ALP_COMPRESSION_MODE = 0;
+
+  /** FOR encoding for integers (0 = FOR) */
+  public static final int ALP_INTEGER_ENCODING_FOR = 0;
+
+  /** Size of the ALP page header in bytes */
+  public static final int ALP_HEADER_SIZE = 8;
+
+  // == Vector Constants ==
+
+  /** Default number of elements per compressed vector (2^10 = 1024) */
+  public static final int ALP_VECTOR_SIZE = 1024;
+
+  /** Log2 of the default vector size */
+  public static final int ALP_VECTOR_SIZE_LOG = 10;

Review Comment:
   same comment on configurable.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at

Re: [PR] Add ALP (Adaptive Lossless floating-Point) encoding support [parquet-java]

2026-01-22 Thread via GitHub


emkornfield commented on code in PR #3390:
URL: https://github.com/apache/parquet-java/pull/3390#discussion_r2719331667


##
parquet-column/src/main/java/org/apache/parquet/column/values/alp/AlpConstants.java:
##
@@ -0,0 +1,136 @@
+/*

Review Comment:
   there was a recent comment on AI generated code on Arrow, if we aren't doing 
a lot of edits I'm not sure how it should be licensed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]