Re: Desafío 1brc

Enrique Herrera Noya Tue, 09 Jan 2024 03:39:26 -0800

hola
estos temas lo vi para mi examen de titulo, veo dos planos acá

1. el tuning propiamente tal de postgres (para solicitudes en paralelo,y también uso de indices, etc),

ya te han dado algunas consideraciones a tener...

2. usar código eficiente en java

te comparto lo que desarrolle como prueba de concepto, de manera que tehagas una idea,

(el primero permitió una mejora sustancial respecto del código ineficiente)

el segundo aun mas , pero con mas consumo de memoria. (donde se ejecutajava)



código eficiente sin hilos
...

import java.sql.Connection;

import java.sql.DriverManager;

import java.sql.PreparedStatement;

import java.sql.SQLException;

import java.util.Random;

import java.lang.management.ManagementFactory;

import java.lang.management.MemoryMXBean;

public class EfficientProcessingExample {

public static void main(String[] args) {

String jdbcUrl = "jdbc:postgresql://localhost:5432/carrera";

String username = "admin";

String password = "l3r0l3r0";

try {

Connection connection = DriverManager.getConnection(jdbcUrl, username,password);


PreparedStatement preparedStatement = connection.prepareStatement(

"UPDATE curso SET promedio = ?, aprobado = ? WHERE indice = ?;"

);

int totalRecords = 40000;

Random random = new Random();

long startTime = System.currentTimeMillis();

for (int idx = 1; idx <= totalRecords; idx++) {

int nota1 = random.nextInt(7) + 1;

int nota2 = random.nextInt(7) + 1;

int nota3 = random.nextInt(7) + 1;

double promedio = (nota1 + nota2 + nota3) / 3.0;

boolean aprobado = promedio > 4.0;

preparedStatement.setDouble(1, promedio);

preparedStatement.setBoolean(2, aprobado);

preparedStatement.setInt(3, idx);

preparedStatement.addBatch(); // Agregar la consulta al lote

if (idx % 1000 == 0) {

preparedStatement.executeBatch(); // Ejecutar el lote de consultas cada1000 registros


preparedStatement.clearBatch(); // Limpiar el lote

}

}

preparedStatement.executeBatch(); // Ejecutar cualquier consultarestante en el lote


preparedStatement.close(); // Cerrar la declaración

long endTime = System.currentTimeMillis();

long elapsedTime = endTime - startTime;

System.out.println("Tiempo de ejecución: " + elapsedTime + " ms");

// Obtener información sobre el uso de memoria

MemoryMXBean memoryMXBean = ManagementFactory.getMemoryMXBean();

long usedMemory = memoryMXBean.getHeapMemoryUsage().getUsed();

String sizeUnit;

double sizeValue;

if (usedMemory < 1024) {

sizeValue = usedMemory;

sizeUnit = "bytes";

} else if (usedMemory < 1024 * 1024) {

sizeValue = (double) usedMemory / 1024;

sizeUnit = "KB";

} else if (usedMemory < 1024 * 1024 * 1024) {

sizeValue = (double) usedMemory / (1024 * 1024);

sizeUnit = "MB";

} else {

sizeValue = (double) usedMemory / (1024 * 1024 * 1024);

sizeUnit = "GB";

}

System.out.println("Uso de memoria: " + sizeValue + " " + sizeUnit);

connection.close();

} catch (SQLException e) {

e.printStackTrace();

}

}

}



...

código eficiente con hilos

...
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.PreparedStatement;
import java.sql.SQLException;
import java.util.Random;
import java.lang.management.ManagementFactory;
import java.lang.management.MemoryMXBean;
public class EfficientProcessingWithThreadsExample {
    public static void main(String[] args) {
        String jdbcUrl = "jdbc:postgresql://localhost:5432/carrera";
        String username = "admin";
        String password = "l3r0l3r0";
        try {

Connection connection =DriverManager.getConnection(jdbcUrl, username, password);

            int totalRecords = 40000;

EfficientProcessorThread[] threads = newEfficientProcessorThread[totalRecords];

            for (int idx = 1; idx <= totalRecords; idx++) {

threads[idx - 1] = new EfficientProcessorThread(idx,connection);

                threads[idx - 1].start();
            }
            long startTime = System.currentTimeMillis();
            for (EfficientProcessorThread thread : threads) {
                thread.join();
            }
            long endTime = System.currentTimeMillis();
            long elapsedTime = endTime - startTime;
            // Obtener información sobre el uso de memoria

MemoryMXBean memoryMXBean =ManagementFactory.getMemoryMXBean();

            long usedMemory = memoryMXBean.getHeapMemoryUsage().getUsed();
            String sizeUnit;
            double sizeValue;
            if (usedMemory < 1024) {
                sizeValue = usedMemory;
                sizeUnit = "bytes";
            } else if (usedMemory < 1024 * 1024) {
                sizeValue = (double) usedMemory / 1024;
                sizeUnit = "KB";
            } else if (usedMemory < 1024 * 1024 * 1024) {
                sizeValue = (double) usedMemory / (1024 * 1024);
                sizeUnit = "MB";
            } else {
                sizeValue = (double) usedMemory / (1024 * 1024 * 1024);
                sizeUnit = "GB";
            }

System.out.println("Uso de memoria: " + sizeValue + " " +sizeUnit); System.out.println("Tiempo de ejecución: " + elapsedTime +" ms");

            connection.close();
        } catch (SQLException | InterruptedException e) {
            e.printStackTrace();
        }
    }
}
class EfficientProcessorThread extends Thread {
    private final int idx;
    private final Connection connection;
    private final Random random;
    public EfficientProcessorThread(int idx, Connection connection) {
        this.idx = idx;
        this.connection = connection;
        this.random = new Random();
    }
    @Override
    public void run() {
        try {
            int nota1 = random.nextInt(7) + 1;
            int nota2 = random.nextInt(7) + 1;
            int nota3 = random.nextInt(7) + 1;
            double promedio = (nota1 + nota2 + nota3) / 3.0;
            boolean aprobado = promedio > 4.0;

PreparedStatement preparedStatement =connection.prepareStatement( "UPDATE curso SET promedio = ?, aprobado = ? WHEREindice = ?;"

            );
            preparedStatement.setDouble(1, promedio);
            preparedStatement.setBoolean(2, aprobado);
            preparedStatement.setInt(3, idx);
            preparedStatement.executeUpdate();
        } catch (SQLException e) {
            e.printStackTrace();
        }
    }
}


...

el código ineficiente:
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.PreparedStatement;
import java.sql.SQLException;
import java.util.Random;
import java.lang.management.ManagementFactory;
import java.lang.management.MemoryMXBean;
public class InefficientProcessingExample {
    public static void main(String[] args) {
        String jdbcUrl = "jdbc:postgresql://localhost:5432/carrera";
        String username = "admin";
        String password = "l3r0l3r0";

        try {

Connection connection =DriverManager.getConnection(jdbcUrl, username, password); PreparedStatement preparedStatement =connection.prepareStatement( "UPDATE curso SET promedio = ?, aprobado = ? WHEREindice = ?;"

            );

            int totalRecords = 40000;
            Random random = new Random();

            long startTime = System.currentTimeMillis();

            for (int idx = 1; idx <= totalRecords; idx++) {
                int nota1 = random.nextInt(7) + 1;
                int nota2 = random.nextInt(7) + 1;
                int nota3 = random.nextInt(7) + 1;
                double promedio = (nota1 + nota2 + nota3) / 3.0;
                boolean aprobado = promedio > 4.0;

                preparedStatement.setDouble(1, promedio);
                preparedStatement.setBoolean(2, aprobado);
                preparedStatement.setInt(3, idx);
                preparedStatement.executeUpdate();
            }

            long endTime = System.currentTimeMillis();
            long elapsedTime = endTime - startTime;

System.out.println("Tiempo de ejecución: " + elapsedTime +" ms");


            // Obtener información sobre el uso de memoria

MemoryMXBean memoryMXBean =ManagementFactory.getMemoryMXBean();

            long usedMemory = memoryMXBean.getHeapMemoryUsage().getUsed();
            String sizeUnit;
            double sizeValue;

            if (usedMemory < 1024) {
                sizeValue = usedMemory;
                sizeUnit = "bytes";
            } else if (usedMemory < 1024 * 1024) {
                sizeValue = (double) usedMemory / 1024;
                sizeUnit = "KB";
            } else if (usedMemory < 1024 * 1024 * 1024) {
                sizeValue = (double) usedMemory / (1024 * 1024);
                sizeUnit = "MB";
            } else {
                sizeValue = (double) usedMemory / (1024 * 1024 * 1024);
                sizeUnit = "GB";
            }

System.out.println("Uso de memoria: " + sizeValue + " " +sizeUnit);

            connection.close();
        } catch (SQLException e) {
            e.printStackTrace();
        }
    }
}

....









El 08-01-24 a las 23:59, Jairo Graterón escribió:

Saludos lista
Hay un reto para crear un algoritmo en java para para recuperarvalores de medición de temperatura de un archivo de texto y calcularla temperatura mínima, media y máxima por estación meteorológicahttps://www.morling.dev/blog/one-billion-row-challenge/
Pero se están haciendo implementaciones en otros lenguajes y porsupuesto en bases de datos por ejemplohttps://ftisiot.net/posts/1brows/ yhttps://rmoff.net/2024/01/03/1%EF%B8%8F%E2%83%A3%EF%B8%8F-1brc-in-sql-with-duckdb/
Ya inserté los mil millones de registros en mi máquina y al realizarla consulta
image.png
Tarda casi 2 minutos, así que seguí investigando como mejorar eltiempo y al encontrar estas otras pruebashttps://gist.github.com/FranckPachot/50a6a491b85b0ddb3da6399d54653085me llamó la atención ésta línea
select/*+ parallel(8) gather_plan_statistics*/
Revisando postgres tiene un parámetro para aumentar el número deworkers en paralelo si la consulta lonecesita max_parallel_workers_per_gather
image.png
Mejoró bastante, 40 segundos menos.
*¿Qué otras optimizaciones se podrían realizar en postgres paradisminuir el tiempo?*
Con  Apache Pinot tarda aprox 1.9s
https://hubertdulay.substack.com/p/1-billion-row-challenge-in-apache?r=46sqk&utm_campaign=post&utm_medium=web<https://hubertdulay.substack.com/p/1-billion-row-challenge-in-apache?r=46sqk&utm_campaign=post&utm_medium=web>
Otro tardó 20 segundos
https://twitter.com/_TylerHillery/status/1742971310123487429
Por supuesto eso depende de las especificaciones del equipo pero esinteresante que compartan sus experiencias.
Las especificaciones de mi máquina son:
Ryzen 5 6 cores/12 Threads a 3.0ghz
Disco nvme KINGSTON
Ubuntu 22.04
Postgresql 14


Enrique Herrera Noya
--
+56 992303151
Red Hat Certified Engineer RHCE Nº100223072 (RH6.0)
Red Hat Certified System Administrato RHCSA Nº100223072 (RH6.0)
Red Hat Certified Technician (RHCT) Nº605010753835478 (RH5.0)
Novell Certified Linux Professional CLP 10
Red Hat Delivery Specialist -Container Platform Application Deployment I
Red Hat Delivery Specialist - Container Platform Administration I
RED HAT SPECIALIST
How to Sell Red Hat OpenShift for Infrastructure
How to Sell Red Hat OpenShift for Developers
Red Hat Sales Engineer Specialist - Container Platform
Red Hat Sales Engineer Specialist – Automation

Re: Desafío 1brc

Reply via email to